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A FORMAL THEORY 0? PERCEPTION 
V/illiam Arthur Rottmayer 

Stanford University 
Stanford, California 9^305 

CHAPTER 1 
BACKGROUND AND MOTIVATION 

Work on this problem began as a joint effort of three people, 
Patrick Suppes, George Huff, and myself* The characterization of the 
problem is due primarily to Professor Suppes- The particular method we 
chose to attack it grew out of discussions among the three of us, and it 
is difficult to separate the contributions of each. This method consisted 
of constructing a particular mod si* Once the model was agreed upon, it 
was possible to independently obtain results about it, and most of the 
results contained herein are due zo my own efforts. 

The approach we took to the problems of perception concerns itself 
much more with scientific work than has been customary in the ^.pproach 
taken to these problems by recent philosophers. For this reason, it is 
useful to discuss the conditions that led us to this approach before turning 
to the details of the work we did. These considerations were not made 
explicit before we began, but were definitely there in the backs of our 
minds. This explicit account is my own creation, but it was obtained by 
reflecting on the common work we did. Thus it is an accurate account of 
my own motivation and a more or less satisfactory account of Professor 
Suppes and Mr. Huff's motivation. This account breaks down into three 
parts. First, there is a rough characterization of the dominant themes 
in the recent philosophical approach to perception and then our approach 
is compared and contrasted with this approach. Secondly, a brief account 
of the scientific work that influenced us is given. This is a good method 
of showing the main features of our work, and is also useful since many 
philosophers are perhaps ^infamiliar with much of this material. Finally, 
there is a detailed discussion of why we felt our approach is advantageous 



in trying to solve certain of the problems of perception. This chapter 
is divided into three sections, corresponding to these three topics. In 
the following discussion, it is understood that the entire discussion 
concerns perception* The things I say are meant to apply only to the 
philosophical discussion of perception, and are not applicable in any way 
to other philosophical problems, unless a claim to the contrary is made. 
I do not maintain that vnat I say applies Lo problems other than perception 
simply because I have no way of supporting such a view* Indeed, I believe 
that many of the 'tnings I say concerning perception are not true if 
applied without restriction to other philosophical problems. Ir. any case, 
there is no reason to bring up the more general view in this paper, since 
it concerns itself entirely with the problem of perception. 

Section 1 

For convenience in discussing different approaches to the problem 
of perception, it is in^itructive to think of talk about perception as 
occurring in one of three languages: the language of pnysics and 
physiology (PP), the language of psychology and computer science (PC), 
and ordinary English (OE). PP contains talk of light waves stimulating 
the retina and electrical impulses being transmitted to the brain along 
the optic nerve. PC contains talk of the inputs and outputs of information 
processing systems, and how these systems can be altered by learning. 
Another way of characterizing PC is to say PC talks of percept ibn in the 
same way Chomsky talks of language. Of the three languages, PC is the 
newest and least developed, and thus the most unfamiliar. Hence, the 
above characterization is not completely satisfactory. However, it does 
give a rough idea and what I have in mind will become clearer as the 
paper progresses. OE is well known to philosophers. This threefold 
division of perceptual talk is not the only one possible, and it is 
certainly true that none of the three languages has been precisely defined 
and that there are significant borderline cases. This division is useful 
in stating n^r view of the philosophical problem of perception and how it 
should be approached, however, and that is all that is necessary. 



There is no peculiarly philosophical language in the above division. 
The reason is that philosophical problems do not arise in a special 
language; they arise in a language that is already being used in a non- 
philosophical way. Philosophers may invent special terms for talking 
about the non-philosophical language in order to facilitate their 
discussion. The basic problems, however, are problems that are statable, 
perhaps in an imprecise way, in the non- philosophical language. I believe 
this is true of philosophical problems in general, aiid that the pi'oblem 
of perception is not exceptional in this regard. The particular threefold 
division into OE, PC, and PP was chosen because of its special relevance 
to perception, however, and would probably be unsatisfactory for most 
other uses. Using this threefold division as the framework for the 
discussion, the question arises of how does philosophy fit into this 
framework. Some philosophical problems, deal witn the interrelationship 
of the three languages. Issues involving questions of reduction fall in 
this category. If one arranges the three languages in order of complexity, 
OE is the simplest, PC is next, and PP is the most complex. Thus, if one 
were interested in the problems of reduction, ^here are two things that 
could be done: reduce OE to PC, or reduce PC to PP. Reducing OE to PP 
would simply be a matter of combining these two steps* However, we are 
not interested in reductionism, so the interrelationship between the 
different languages is not an important factor in our work. The remaining 
philosophical problems must be statable in at least one of the remaining 
languages. Which language is the likely candidate? PP isn't, for two 
reasons. First, it is not possible to state the philosophical problems 
of perception in PP, since in this language talk of even ordinary aspects 
of perception becomes unmanageably complicated. Indeed, in the present 
state of affairs, it is not even clear how one would go about translating 
philosophical problems into PP. Secondly, the conceptual framework of PP 
is well worked out, and once it is possible to deal with a problem in PP, 
there are no longer philosophical mysteries surrounding it. This 
preliminary discussion has thus led to the position that the interesting 
philosophical problems are statable in either OE or PC, or both. The 
real problem is which of these three possibilities is correct. My own 
position is that philosophical problem , arise in both OE and PC, but that 



r.he most important problems arise in PC, I do not want to dispute the 
claim that some of the philosophical problems of perception arise in OE^ 
but I do disagree with the view that the problems of perception of 
primary philosophic interest arise in OE* Thus^ I think philosophers 
working on perception should work both in OE and in PC, with more emphasis 
on the latter than the former. 

• This position Is different from the one prevalent in twentieth 
century British-American philosophy^ which is that philosophical problems^ 
including the philosophical problems of perception^ arise in OE. The 
prevalence of the view that philosophical problems arise in OE is closely 
related with two other beliefs which are characteristic of English 
philosophy in this century: namely tnat i.hore is a sharp distinction 
>)etween philosophy and science and the rejection of the causal theory of 
perception* The reason for this connection is clear. As far as the 
philosophically interesting problems of perception are concerned^ PC is 
the language of science. If philosophers work in PC^ ^hen there will 
be no clear separation between their work and ccie*.vxfic work. This is 
not to say that the two will be identical^ for presumably the philosopher's 
approach and goals will differ from the scientist* s-^ If the philosophers 
ccnf ine their attention to OE, then there will be a sharp boundary between 
their work and the scientist's work. This boundary will be at least as 
sharp as the boundary between OE and PC, which is fairly clear v Thus, 
the belief that philosophers should work in OE goes hand in hand with 
the belief that there is a sharp distinction between philosophy and 
science. Secondly, it is also fairly clear that the theory of perception 
which is implicitly contained in OE, if there is in fact such a theory, 
is not a causal theory. H. P* Grice is the only modern philosopher I 
know of who has attempted to give a causal account of perception in OE, 
and by his own admission, his theory 'is very far from the spirit of the 
original theory. The theory implicit in PC is a causal theory with 
the original spirit, i.e., it is a genuine causal theory. 1 will have 
more to say concerning the causal theory later^ 

hi. P. Grice, '*The Causal Theory of Perception/* Perceiving ^ Sensing ^ 
and Knowing , ed- Robert J. Swartz (New York: Ifoubleday, 1965) >P- ^72. 



To sum things up^ scientists work in PC or PP^ and accept the causal 
theory. Philosophers have worked in OE^ rejected the causal theory, and 
correctly recognized that if this is correct, there is a sharp distinction 
between philosophy and science. My own view is that there is no such 
sharp distinction, that philosophers should work in PC as well as OE, 
and that the causal theory is correct. 

The work we have been doing on the problem of perception is in the 
language PC This work is not an isolated attempt to deal with seme of 
the problems of perception, but is part of a unified approach to the 
whole problem. Two features of this work can be illustrated by 
contrasting it with the classic materialist doctrine. Materialism, when 
restricted to perception and stated in the present framework, is the 
claim that statements in both OE and PC can be reduced to statements in 
PP. There are two differences between our approach and the materialist 
program. The first difference is that our approach is not, like 
materialism, a reduction. It is not an attempt to reduce OE to PC. 
Rather, it is an attempt to state and solve classical philosophical 
problems within PC. Perhaps OE could be reduced to PC, but this is 
irrelevant to what we are trying to accomplish. Secondly, materialists 
have always claimed that PP was adeqimte for all talk of perception ±n 
principle , but have not tried to carry out the necessary reduction in 
detail. In solving specific problems of perception, it is not helpful 
to know whether or not a particular reduction is possible in principle ; 
the only thing that would be of use would be an actual reduction. Our 
approach deals with specific problems and is useful when one has to 
deal with these problems. The fact that materialism is of no use in 
dealing with specific problems is perhaps the main reason for the 
twentieth century philosophical concentration on OE and the consequent 
split with science*. Speaking in the present framework, at the beginning 
of this century there were only two languages which philosophers such as 
G. E. Moore could work in: PP and OE. There was no way to work in PP> 
So OE was the only possibility. It has proven very difficult to deal 
with all the problems of perception in OE, but fortunately it is not 
necessary to make the choice that confronted Moore, for PC is now 
available. This language can be applied to specific problems. Two such 



problems are the problem of synthetic a priori knowledge and the problem t 

of sense data. Both of these problems are difficult to state, let alone i 

solve ^ In OE. There is notning in OE that corresponds to the predicates 

synthetic and a priori in any straight forward fashion for there is no | 

need for such concepts in ordinary discourse. OE also contains no sense 

data .erms. Thus, it is very difficult to discuss these problems in OE I 

for the language does not even provide an adequate conceptual framework 

in which to state these problems. It is my belief that PC does provice I 

Guch a frameworko Lator^ I will give a model, drawn from PC, of the 

perceptual process which serves as a satisfactory framework in which to | 

discuss these problems. Given this model, it is easy to see to what part • 

of the perceptual process the terms 'synthetic a priori ^ and 'sense data* « 

apply to, and hence to see precisely what the problems are- Moreover, i 

this model indicates in a general way what a satisfactory solution would 

look like* The outlook is not completely optimistic, however, for to I 

actually get an explicit solution to these questions would T:'equire a much 

more well- developed theory* This will require a lot of work, and what I 

we have done is only the beginnings of a complete theory. 

Section 2 I 
In a situation from either ordinary life or a psychological experiment, 
it is often convenient to divide human activity into perceptual input, | 
the processing of this input together with infonnation stored in memory, 
and the resulting output. Ignoring the output device, an organism capable | 
of such activity can be thoiaght of as consisting of three parts, the I 
perceptual component, memory, and the processing device. This paper deals 
with the perceptual component, which we believe is the least understood J 
part'. A computer provides at least a rough first approximation to the 
processing device, and there are also roughly adequate models for memory. I 
There is currently no such model for the perceptual component, not even 
a very rough first approximation model that will provide a framework for I 
dealing with the problems of perception. Providing such a model is much 
too large a problem to deal with all at once, so we have restricted '| 
ourselves to a small part of the problem* -* 
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It is natural to divide the perceptual component into five parts, 
corresponding to the five senses. Of these part.s, the visual part 
occupies a place of salient importance, and it has been widely discussed 
in both the philosophical and the scientific literature. Thus, we 
decided to concentrate completely on the visual part, and this decision 
has guided our subsequent thinking. The fact, which now appears evident 
to me, that our model applies equally well to the taotrle part is simply 
a happy coincidence. It occurred to me only a -e settled upon 
the approach we have taken. This fact was made possible by our decision 
to concentrate on geometry, which is at least intuitively based on both 
our visual and tactile experiences. It really results from the particular 
starting point we chose, as I will explain shortly. Right now, I want to 
give some motivation for concentrating on geometry. 

Figuratively speaking, our idea is that visual perception has many 
factors, and that geometry is what ties them all together. More 
accurately, it provides the framework to which all the other factors must 
be attached in order to come up with a satisfactory model for the whole 
visual part. TYlIs conception is the basis of much of the scientific work 
in the area. Moreover, philosophers have long attributed central 
impc ice to vision and to geometry. This is almost self-evident, but 
a fe\y ^emarks concerning it are in order. Locke calls vision Hhe most 
comprehensive of the senses,* and one of Berkeley's major works is an 
essay concerning it. More generally, the typical example usad in 
philosophical discussion of perception is almost always an example from 
visual perception, as in the Moore case below. The importance of geometry 
isnH quite so evident until one realizes that philosophers used to talk 
of 'extension* and nowadays talk of • space* and * spatial relations* 
instead of geometry. This is primarily a terminological point, however, 
since extension was regarded as the subject matter of geometry just as 
spatial relationships are now. Thus, Descartes and Kant, whom I will 
discuss later, are good examples of philosophers who assign a crucial 
role to geometry. More generally, any philosopher who uses spatial 
properties to individuate sense data or physical objects shares this 
viewpoint to some extent. G. E. Moore is a typical example. In a general 
discussion of what happens when we perceive an object, he confines his 
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attention to the particular case of what happens when we see an envelope. 
The importance of position, size, and shape are evident throughout the 
discussion, but their overriding importance comes out in Moore's 
defin^-- -n of a material object: "I propose, then, to define a material 
objec ... . .nething which (l) does occupy space; (2) is not a sense datum 
of any kind whatever; and (3) is not a mind, nor act of consciousness." 
Moore admts that this is an incomplete definition, but it is interesting 
that (1) is the only positive element in a definition that is supposed 
to be at least partially satisfactory. 

The best way to characterize our particular approach is to contrast 
it with two other scientific approaches to the same problem. The first 
of these is the artificial intelligence approach. This work is done 
primarily by computer scientists, and it is concentrated in two places, 
Massachusetts Institute of Technology and the Stanford Research Institute. 
The goal is to write a computer program that has roughly the visual 
capabilities of the human brain. The computer uses a television camera 
for an eye, and the problem is to get a camera- program combination that 
can do the same kinds of tasks that the eye-brain combination can. There 
are two features of this work I want to mention. Our approach is the 
same as the artificial intelligence one in regard to the first of these, 
but completely different in regard to the second. The major problem 
encountered is getting the computer to be able to divide the scene it 
is presented with into regions that go together in the way humans can. 
This is ncessary if the computer is going to be able to distinguish 
physical objects by just looking at them. Geometry plays a crucial role 
in this problem, and this is further justification for concentrating our 
efforts on it. Indeed, on this approach the primary reason for 
investigating our other visual abilities, 3uch as the ability to recognize 
colors and textures, is that these abilities provide us clues a. to how 
to divide the visual scene into different regions and about the spatial 
orientation and relationships of these regions. Thus, for instance, a 
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sudden change in color, texture, or light intensity is not of interest 
by itself, but is interesting because it indicates a boundary between two 
regions. On this analysis, it is natural to divide the computer's task 
into two distinct and quite independent tasks: drawing in boundary lines 
and tnen analyzing the resulting line drawing Into bunches of regions that 
go toge-'ner, i.e., are faces of the same physical object. It is true that 
in solving a definite problem the computer will go back and forth between 
these two tasks; for example, it will draw in an edge of a cube that 
doesn't show up in the first drawing on the basis that an edge i? needed 
to make the analysis of the whole scene satisfactory and that a finer 
check .)f the place in the scene where this line ought to appear reveals 
soiAe iridication a line should be there. This sort of interaction not 
only works, but it is intuively very appealing, since it seems people 
operate in the same way, i.e., if they arenH satisfied with the picture 
they get from a quick glance at a scene, they go back and inspect it in 
detail. However, this sort of interaction doesn't alter the fact that 
the two tasks are conceptually quite independent « This point suggests that 
it would be wise to study the two tasks separately, and solve the larger 
problem by combining the answers to the two smaller ones. We accept the 
above analysis, and the course we took was to concentrate on two-dimensional 
line drawings and thus on the second of the two tasks* I believe this 
discussion is worth emphasizing, for at first glance, it is not at all 
clear how the specialized model we deal with, which concerns itself 
entirely with straight line drawings, can be regarded as part of a general 
theory of visual perception. We do regard it as such, and as the above 
discussion makes clear, have definite ideas on the place it would occupy 
in a complete theory. It is interesting to note that Helmholtz came to 
much the same viewpoint as the result of extensive optical experiments 
nearly a hundred years ago. He noticed that people are very attentive 
to visual characteristics that indicate how what they see is divided into 
physical objects or give clues concerning the size, shape, and distance of 
these objects. Indeed, adults process these clues so automatically that 
they can describe much more accurately the objective sizes and shapes of 
objects than they can the subjective visual phenomena. This habit is so 
engrained that it takes years of practice to even be aware, to even see, 
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in the ordinary sense of the word^ the subjective visual phenomena. Most "} 
people are as unconscious of these phenomena as they are of the blind 

spot, and it is one of the main purposes of artistic education to bring „ 

these phenomena to consciousness, I will say more of Helmholtz^s views I 

in Sec-oion 3» ^ 
The second distinguishing feature of the artificial intelligence | 
approach is that it is interested solely in building a macnine that can 
do the tasks in question, not in building oneHhat can learn to do 1 
themo We are interested in the latter task. It is clear that humans 
have to learn many of the facts they use in analyzing a visual scene, and T 
thus only a learning device of some sort can be a completely satisfactory 
model. This is not an easy thing to do, nowever. and we have felt 
compelled to deal wit.h far simpler problems than tne artificial 
intelligence people are currently dealing with. The upshot of this is 
that our work is really a complement for the artificial intelligence 
approach, rather than a ccmpe-^itor for it. 

The second approacn T varr. to conoras^ ouxsvii-h is the perceptron ^ 
approach- Actually^ it is much more accurate tc say *hat Minsky and 
Papert's book Perceptrons ^ is what influenced us, rather than the 
perceptron approach » Tne following characterization of the perceptron 
approach is taken mostly from their book. The approach is like ours 
in that it emphasizes learning* A perceptron is in fact a simple sort 
of learning device » Wnat it is supposed to do is come up with an answer 
to a complicated question after being given the answer to a lot of 
simpler questions c Supr^ose there are n of these simpler questions and 
each one is of the fonn Moes the predicate F^^l < ^ < ^'-^ hold,^ If it 
does, F. =^ Ij if not, F. 0.- The perceptron has u coefficients a^^ 

and it computes an answer to the complicated question, which is of the | 
form 'does the predicate G hold/ by computing If the sum is 

greater than some number k, the perceptron answers yes (G = l); if not, 
no (G O). It learns by changing the a^'s, i.e., it is given an initial 
value for each a^ and k and run through a number of trials, being told 
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the correct answer after each trial and alters the coefficients on this 
basis according to some preordained strategy. Machines of this type have 
a surprising amount of power. They can^ in fact, given the appropriate 
predicates, learn to play championship checkers- It was widely believed 
a decade or so ago that they could learn Just about anything. People 
clung to this belief even though it remained largely unsubstaniated, a^d 
this fact led Minsky and Papert to write Percept rons , in which the 
inadequacy of perceptrons for certain tasks was clearly shown^ This 
was done by showing that given a certain natural perceptual setup, 
which I will briefly describe, perceptrons cannot satisfactorily learn 
geometrical predicates. 

This setup is a simplified model of the retina or a television 
camera. A two-dimensional plane is divided into squares ^for the prescr.: 
purposes, the shape is inessential, and squares were chosen for 
convenience) and the processing device is told, for each square, that it 
is black or white. Given this information* it should be possible to 
compute the value of certain simple predicates F^^ and from these, the 
perceptron should be able to compute the value of a more complicated 
predicate G. The question now is how to characterize simplicity in this 
setup. One answer is that one predicate is simpler than another if its 
value depends on the color of fewer squares. Intuitively, it also seems 
desirable to localize these squares, e.g., requiring that they be 
adjacent. The first notion is sufficient, however, because Minsky and 
Papert showed that if G is the predicate *connectecl/ then it is 
necessary for one of the to depend on the whole retina if a perceptron 
is going to be able to compute G correctly. This is completely unsatis- 
factory, since all the F^'s must be simpler than G if the perceptron is 
going to be able to accomplish anything substantive. Thus, the setup 
must be altered in some way, and what we have done is replace both the 
model of the retina and the perceptron. 

Instead of the above model of the retina, we decided to deal only 
with straight line figures, and to take the notions of straight line and 
intersection as primitive. I have already discussed the motivation for 
dealing with line drawings. The reason for having only straight lines 
is that we felt that solving this special problem would be a big step 
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toward solving the more general problem^ and thai this special problem 
was complicated enough. In its final form, we regard the learning 
device as simply being presented with all • he information concerning the 
straight lines and their intersections. "L'his is ^he reason that our 
model is applicable to tactile as well as visual perception. For, given 
a drawing with raised lines > a braille drawing;, a person could gather 
the information we regard as being presented by touch* ?he fact that 
this would require motion^ and nence take time, is not essential, since 
alJ we require is that the device at some time have all the information 
at its disposal, not tnat it ga^^her it all at once- :/hi3 will require 
some memory, but memory Is necessary anyway. Moreover^ if one accepts 
Hebnholtz's hypothesis that movement of the eye is necessary to be able 
to perceive visual straight ness^^ there is no difference in tne memory 
requirement for either type of perception. Ir. is true that we do learn 
to recognize fairly accurately that some lines are straight without 
moving the eye. Just as we can feel tnat some edges are straight without 
moving the hand. The above discussion is really about primitive visual 
straightness and primitive Tactile raightness, i-^e., tne perceptual 
phenomena on which the idea of straightness ultimately rests. 

Heimhoitz^s^ hypothesis is not universally accepted. Our work could 
be indirectly useful in establishing whether or not. it is true- We were 
originally interested in the question of how people scan straight-line 
drawings. When presented with a drawing^ people don^t simply look at 
one^point on it, but -^heir eyes move back and forth across it. The motions 
used, and why '.hey are used> are not well understood at all. It 
seems reasonable^ if Helmhol- z is correct, to expect that the primary 
purpose of some of the motions is to decide which lines are iitraig^t. 
This would require special eye movements.^' If one knew what the other 
factors were which determined now a figure is ^^canned, it would be easy 
to recognize such movements. Cur interest in the scanner t>lhapter 2) 

r 

^Fred Roberts and Patrick Suppes, "Some Problems in the Geomenry of 
Visual Perception,"' Synthase , 17 {L967), 177 • 

^lbid.» p. 178. 
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was motivated by a desire to know what some of these other factors were- 
We finally had to abandon the hope of solving this question vhen it 
became apparent that it was necessary to solve the problems we finally 
dealt with before there was any hope of dealing with the scanning 
problem. This problem is still an interesting one^ both in itself and 
for the light it will shed on Helmholtz's hypothesis, and I believe our 
work will provide good background ma-r.erial for solving it. 

We also discarded the perceptron as the model of the learning 
device. Instead, we took the finite state automaton (fsa) to be the 
model for what the learning device should be at asymptote- The 
justification for this move is discussed at length in Cnapter 3* Once 
this move is made, the obvious problem is to find a way to code the 
information in a straight- line drawing in such a way that it can be put 
on the input tape of an fsa. Finding such a coding, discovering its 
geometrical properties and finding convenient methods for determining 
which predicates a given coding has are the central points covered in 
Chapter 2. It is necessary to have convenient mf=;tnods for determining 
these predicates before attempting to build a device that can learn to 
recognize them. It is difficult to build a learning device when one 
knows the method and operations that it uses, it is virtually impossible 
otherwise. Thus, the material on this subject is an important step 
towards our goal. The operations we eventually used on the codings are 
set -theoretical in nature and would require the full power of a Turing 
nachine, not an fsa, to execute. This is irrelevant as far as the 
codLng problem is concerned, however, since the input tape of a Turing 
machine is exactly the same as that of an fsa* There were strong reasons 
for this switch, however, since it does have the unfortunate effect of 
creating a gap between I he work we did on the coding and the work we 
did on learning. I will say more about this gap. and how it might be 
filled, in Chapter 

The last point in this section is that the effort to come up with 
a coding is interesting in itself; apart from the specific purpose of 
applying it to results in learning theory, which is what motivated us. 
It is obvious that the picture theory, which says that, there are triangles 
in the brain when one is looking at triangleS;, Is false. At least there 
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is no evidence to support the belief tnat it is true. I'f '-ne believes 
that what, is really present in tne brain is different electrical states 
of the nerves, there is no reason to suppose -.here are triaugies present. 
Indeed, it is difficult to see what such a triangle vould look like, 
and not at all clear what wculd be explained by posi'ing the existence 
of such a triangle. Moreover, one would really have to hold tnat t.he 
triangle was somehow transmitted bodily from the retina to tne brain in 
order to make such a belief plausible) if what is transporr.ed is merely 
a coded electrical impalse from which the triangle is reconstructed in 
the brain, the brain might iust as well operate directly on tne codea | 
electrical impulse, since it contains the necessary information. Besides, 
the motivation for believing tnere are triangles in the brain is tnat. 
it Is hard to see how t.o code a triangle in an electrical impulse, and 
such a coding must already exist iinless the triaigie is present at all 
points along the optic nerve. It will have to be able to jump across 
synapses, too. j'm not sure rhere is anybody who would actually hold 
such a theory... but i^: is a very natural way to look at the problem, so 
these remarks are perhaps worthwhile. Granting that, t.he re are no 
triangles in the brain, it is interesting to try to do geometry in 
strange contexts r-hat resembie more closely what the actual coding might 
be. I believe the coding we use is closer to the actual coding, ait>hough 
it certainly idn*t too close. It might have some of '.ne same general 
properties, -lo-wever. What 1 can say, though, is that t, sinking about our 
strange-iooking coding has the very desirable effect of freeing one^s 
mind from the picture theory., which has a tendency to influence one's 
thinking after it, has been consciously rejected, This can he of 
philosophical value, as the discussion of abstract, ideas in the next 
section indicates. 



Section 5 



The purpose of this section is to justify the claim made in Section 
1 that it is advantageous to deal with the philosophical problems of 
perception in PC. It consists of specific examples of problems wi.ich 
1 believe are more appropriately dealt with in FG than In OE and a few 
general remarks on why I thirJk this is the case. 1.'o provide a framework 
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for this discussion, the first order of business is to state the natural 
PC model of the perceptual process. 

Speaking in a common sense way, perception is a process, at one end 
of which there is a physical object, e*go, a table (which I call the 
object), and at the other end what a person seeing the table is conscious, 
or aware, of (which I call the percept )o In PP. t.his process is a 
continuous one, and hence extremely complicated ana difficult to work 
with. In PC, however, the process can be broken down into four parts^ 
as in the following diagram' 
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Figure 1 

Components of the Perceptual Process 

The way understand the input to M is this: Imagine the 
description of a person looking at, a table that would occur in PP. Light 
waves emanate" from tne table^^ are refracted at tne cryst aline lens, 
strike the retina, from which certain electrical impulses are transmitted 
to the brain, and a certain state of the brain results which corresponds 
to the percept. The percept is dependent on the person* s previous learning 
(Hume and Kant would say 'experience* but it seems to me that the 
word ^learning' is more accurate);^ for it is a well-known fact that 
people with different backgrounds and training are aware of different 
things when looking at the same object. Thus, somewhere in the perceptual 
process, learning has to take place. It canH take place before the 
light waves hit the retina* Thus-, somewhere in the retina^ the optic 
nerve, or the brain, there is the first place at whicn learning can take 
place. The state of the electrical impulse right before it reaches this 
point is what corresponds to the input to M, for after this point what 
happens occurs in which is the learning device • The relation R 
behween the object and the input is primarily subject niatter for 
physicists and physiologist s^, and hence it should be ^udied in PP. Ma.ny 
of the ordinary philosophical examples of illusions, such as a stick in 
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water appearing bent and railroad tracks appearing to converge^ conoern 

themselves with R. Illusions that dc arise because there is not a \ 

perfect correspondence between object and input do not cause any 

conceptual difficulties^ as far as I can see^, and thus do not seem to | 
be philosophically interesting. The output of M corresponds to the 

percept^ as it is the last thing in the process. The relation S between ^ 
the input and the percept is dealt with in many of the psychological 
examples of illusions, such as the figure-ground distinction. This ^ 
relation seems to me to be the philosophically significant one. The way > 
to study it is to study and this is the main thrust of our work, . 
Figure 1 represents all of what I called the perceptual component in 1 
the discussion at the beginning of Section 2. The learning and processing ^ 
that occurs in M is primarily unconscious c The outpat of M is the input \ 
for the conscious processing device, which I referred to simply as the 
processing device in the earlier discussion. As far as the classical 
philosophical theories are concerned;, this picture is closest to 

representative realism, since the input ^represents- the objec-^ . 1 

The first problem I want to consider is the question of the perceptual 
given and in particular the question of sense data. My sources are the ^ 
first chapter in H. H- Price* s Perception /^ and He'lmholtz^s Physiological J 
Optics >^ Price is representative of the dominant themes in recent 

philosophy, wnile Helmholtz holds a more scientifically oriented view. | 

They can be taken as arg^aing for opposing theses concerning perception, 

and hence data. This seems to be Price's posit ion^, for he mentions i 

Helmholtz by name and purports to refute Helmholtz* s theory. I don't 

believe that this is an accurate description of what actually occurs, 'I 

however. I will give the reasons for this belief after discussing Price's 

argument • *| 

The first order of business is to tell what a sense-datum is. .i 
Price gives some examples, and then says, **This peculiar and ultimate 

manner of being present to consciousness is called being given , and J 

'^H. H. Price^ Perception (London: Met hue n & Company., 195^) * | 

®H. von HeimholtZji Fnysiologlcal Optics , trans, James P. C. Southhall. 
(Menasha, Wisconsin: "The Optical Society of America," 1921^-25) • *j 
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^ that which is thus present is called a datum/*9 He then says he can't 

I give a positive argument for the belief that ^here are sense data, but 

can answer arguments against this belief. There are two theses, an a 
H priori one, and an empirical one, of which he says, ''Either of these 

theses, if established, would be very damaging. '^^0 I will not discuss the 
J a priori thesis, which is uninteresting. Price characterizes the 

empirical thesis this way: "This (thesis) maintains that it is in fact 

3 impossible to discover any data. For if we try to point at an instance, 
it is said, we shall have to confess that the so-called datum is not 
I really given at all, but is the product of interpretation. ''"^"^ He 

I attributes such an argument to idealists, but I think, in fact, it is 

^ easier to understand it if one is not an idealist, e*g., from Helmholtz^s 

1 point of vie V/. He then gives tnree arguments, and says "So far, we have 

been attacking the critics of the Given upon their own ground. And that 
I groxmd is this. They begin by assuming that there is a distinction 

between *the real given^ or the given-as-it-is«ia-itself on the one hand, 
I and 'what the given seems to be' on the other. '^^^ fj^ j-^^^ gives his 

most important argument, which is that '"The distinction between the Given 
I as it really is and what the Given seems to be is altogether untenable. -^5 

I 1 nust confess that when the argument is put in these terms, I have 

difficulty in seeing how to resolve the issue one way or the other, 
i However, it seems to me that the essential point of the ant i- sense 

data argument is that there are two things in perception that must be 
kept clearly separate, and this is something which 1 believe is true. 
In the above model, the two things are the input to M and the percept. 
In more familiar terms, the two things are the actual stimulation 
of the retina, what Quine calls the "ocular irradiation pattern, and 
what we are actually conscious of perceiving. I note that the retinal 
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stimulation doesn^t necessarily correspond to the input, but using it 
makes the necessity for having two elements more obvious o That tnese two 
things are distinct, I think, is undeniable, but I will give some material 
from Helraholtz in support of it. This is the reason why I believe the 
anti-sense data position is easier to hold if one is not an idealist, 
since an idealist would have trouble making sense of the phrase * retinal 
stimulation,* In a later chapter. Price mentions Helmholtz specifically 
while discussing causal theories of perception, i.e., theories that say 
we must infer what we are finally aware of. He says *^The theory may say, 
with Helmholtz and others, ^Xou do infer but you are not conscious of 
inferring, because you do it so quickly and without any efforts' This 
will not do* If we are net conscious of inferring^ what evidence is there 
that we do infer at all? And if it be replied 'Of course you do, for all 
consciousness of matter must be inferential,^ we must point out that this 
begs the question/'^5 The conclusion one is supposed to draw from the 
above paragraph is that there is no evidence that we do infer. The only 
thing I can imagine that Price had in mind when he wrote this is that 
it is obviously Impossible to get any direct introspective evidence that 
there is an inference since the inference is, by hypothesis, unconscious. 
To conclude from this that there is no evidence is clearly mistaken, 
however. All it shows is introspective evidence is impossible, and thus 
that the evidence one has to adduce must be of a different, and in a 
way indirect, nature. There is a whole body of such evidence in favor of 
Helmholtz' s view, much of it contained in Physiological Optics ^ which 
Price simply ignores. For example, consider the phenomenon of the blind 
spot. People simply fill in this hole in the visual field to look like 
the surrounding area. ^Tiis certainly takes place unconsciously, and if 
Price's argument against Helmholtz is correct, it follows that we could 
have no evidence that this occurs. This is manifestly false* The only 
thing we can't have is direct introspective evidence. 

It therefore appears at first sight that Price has put forward an 
extremely bad argument. If one takes him to being arguing against 
Helmholtz on Helmholtz' s own ground, this is certainly the case. 
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Helmholtz was working in a scientific context; his language being a 
combination of PP and PCo If one puts Price* s argument in this context, 
it is immediately obvious that it is a bad argument. The example of the 
blind spot was, after all, drawn from the scientific realm. There is a 
better explanation of what has occurred here, however. This explanation 
is plausible, puts things in context, and gives deeper insight into what 
was really going on. Moreover, it doesn't imply the highly improbable 
conclusion that Price is guilty of such an obvious blunder. In reading 
Price, one gets the unmistakable impression that Price* s arguments are 
simply irrelevant zo Hf^lmholtz's position, not that they are bad arguments 
against Hslmholtz. The reason is simply, to go back to what was said in 
Section i, that Price is working on OE, while Helmboltz is working in PP 
and PC. Thus, Price^s argument about unconscious irxferences makes 
perfect sen in OE, which is where he is working, but is manifestly 
unsound in .1 or PP, which is where Helmholtz is working. Thus, Price's 
error is not thao he gives a bad argimient against Helmholtz, but that he 
believes he is offering an argimient against Helmholtz at all» In talking 
about perception, it is very easy to forget what: context one is talking 
in and to assume that everyone is talking in the same context that oneself 
is unless one explicitly takes notice of the context in which the talk 
is occurring. This is the reason for the emphasis placed on PP, PC, and 
OE in this paper. I have found that such a framework is necessary if I 
am going to be able to keep things in their proper contexts. I believe 
Price's error consisted in thinking that Helmholtz was talking in OE* 
If what is said above is correct, it is understandable that Price should 
hold this view, and if it were true, his argument agains.t Helmholtz 
would, in fact, be a reasonable one. Given that Helmholtz was talking in 
a scientific context, however, all arguments in OE are going to be 
irrelevant to his position. The only way to refute He'Lmholtz would be 
to argue in PP or PC, and Price has not done this. The upsnot of ail this 
is that Price's view is a reasonable one in OE, and that Helmholtz* s one 
is reasonable in PP and PC* The crucial issue is where the philosophical 
problems lie^ and Price simply assumes that they lie in OE. Thus, he 
fails to or er any arguments against someone who holds that philosophical 
problems lie in PC or PP<, 
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In Section 2k, Voiiome II of Ph ysiological Optics , Heimholtz lists 
the results of experiments with color contrast o One particularly 
interesting feature is that contrast phenomena (whicn are illusions; since 
in chem a uniform surface appears to have differen. colors) disappear if 
distinct boundaries are drawn between the two differently colored areas. 
Heimholtz says, "^Incidently, it comes out plainly in the capricious result 
of these experiments how hard it is for us to make accurate comparisons of 
luminosity and colour of two surfaces that are not directly in contact 
with each other and have no border between them/'^^ It is not surprising 
that net being directly in contact would have an adverse effect^ but it 
is surprising that a sharply defined bolder would be so important » The 
reason is that people pay attention to color differences that aid them in 
diviaing what a^^jears in the visual field into different objects and 
ignore color changes that are no help in this. 

In Section 26^ Helmnoltz gives the following as one of his basic 

principles in explaining the results of optical experiments: ''We are 

not in the habit of observing our sensations accurately, except as they 

are useful m enabling us to recognize external objects J' ^"^ This confirms 

the role assigned to color visior- by the artificial intelligence people 

that was mentioned in Section 1. It is also, as Heimholtz remarks, one 

of the main goals of an artistic education to make people aware of these 

things they usually don't see. The surprising thing isnH that habit has 

led people to ignore some color differences, but that this habit can 

actually lead people to see different colors where only one really exists. 

As Heimholtz says^ regarding contrast experiuents, ^'If the inducing f^ield 

is supposed to be an independent body, usually the contrast colour does 

l8 

not come out so as to be perceived/* If the two fields are not regarded 
as being independent bodies, then the phenomena appears. At the end of 
the section., he makes an interesting remark: ''To those readers who as 
yet know little about the influence of psychic activities on our sense 
perception^ it may pernaps seem incredible that through psychic activity 
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y a colour can appear in the vx.ual field vhere there is none. The author 

i must beg them to suspend judgment until they have become acquainted with 

the facts in Part III of this work, which vill deal vitti ^3e^.£ e- percept ion. " - 
f" In particular, Section 26, the fir.t section in Part III, is a very good 

discussion of the philosophical issues involved. In this section, there 
\ is a long discussion justifying the claim that what a person i.. conscious 

* of is the result of an unconscious inference. It is an inductive type 

I of inference, but even taking this into account, Helmhoitz calls what 

happens in perception an inference only because what happens resembles 
an argument; there is a premise, the retinal stimulation, and a 
conclusion, what ve are conscious of seeing. Thus, this view is 
essentially the view that there are two different Vnings that must be 
! distinguished in perception, and what one calls the coru-.crion between 

them is a terminological question. Calling it an inference seems as 
appropriate as anything else, and Helmhoitz states this is the only 
reason he uses the term. Hence, Price's procedure of dealing with the 
I view that there are two elements in perception and the causal theory 

separately is not very i Illuminating. To give a celling argument against 
{ Helmhoitz' s position would therefore require an arg-oment against the 

i claim that there are two distinct elements in the perceptual process. 

J To do this would require very ingenious explanations of phenomena that 

! seemingly can only be explained by maJ^ing such a distinction and would 

be a very difficult task to accomplish. Price has not even attempted 

j to do this. 

A serious discussion of whether or not there are sense data requires 
j some criterion for recognizing data. Such a criterion is given by 

Helmhoitz, for after a ].ong discussion, he says, "My conclusion is that 
j notning in our sense-perceptions can be recognized as sensation which 

( can be overcome in the perceptual image and converted into its opposite 

, by factors that are demonstrably due to experience. ■■'^^ In the terminology 

I I have been using, this translates to "Nothing in our sense-perception can 
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be recognized as data which can be altered in the percept by learning." 
I have changed 'overcome and be converted into its opposite* to 'altered,^ 
but> since it is the nature of a datum to be unalterable, the two 
statements have the same meaning* There are two types of illusions^ 
those thrt disappear once we are aware of them^ and these that donH. 
It is difficult to regard the former as being data, but there is a 
question about the lat^^er^ and it was to help answer tnis question that 
Helmholtz formulated the above criterion- His idea is that even illusions 
that donH disappear are not necessarily part of the perceptual data, for 
the effects of that which is the result of years of experience may not 
be negated by simply be'-.oming aware of the fact that the habit does lead 
to illusion. The experiment where people were fitted with glasses that 
inverted the retinal image is a good example of the distinction that 
Helmholtz has in mind, and' it also serves to support nis position. 
When people first put on the glasses, they encountered all sorts of 
difficulty, and there was simply no way to overcome these difficulties 
by consciously inverting the visual field. After a few days, these 
difficult ies disappeared, and things appeared upright . When the glasses 
were removed, the same difficulties reappeared, and again disappeared 
after a few days. This shows that what we are conscious of can be 
changed by experience, and hence Helmholtz* s criterion, which may seem 
innocent ar first, would probably rule out colors and a lot of other 
things as Delrxg perceptual data once the appropriate experiments are 
performed. For example experiments of the above type with contrast 
phenomena, if possible to perform, would probably show that color 
perception is also due to learning. Certainly, if Helmholtz* s explanations 
are correct, this would be the result* 

Helmholtz* s criterion seems to me to be the best one for determining 
what data are. If one applies it to the PC model of perception, the 
input to M is what should be called the data. The percept cannot be the 
data, for it can be altered by learning. Thus, instead of having the 
object > input and output as in Figure 1, the terminology should be object, 
sense data, and percept. 



22 



I 
I 
I 

I 

1 

f 



It is now easy to state Price* s position in this context* His 
position is that the sense data and percept are identical, for he 
identifies the data with what we are conscious of, which is the percept. 
In OE, these things are not clearly distinguished^ so Price* s arguments 
have force, but in PC, they are obviously distinct, so that the most 
ingenious arguments lack force. 

In terms of Figure 1, we now have a clear picture of what a sense 
datura is. This gives a clear framework for talk about sense data, and 
allows one to talk precisely concerning them with a minimum of effort. 

J Disputes about sense data are thus easy to state, and time is not wasted 
in preliminary skirmishing whose main outcome is to make ^he issue 
precise. This framework also provides a touchstone for easily evaluacing 
I argument's concerning sense data, vhere otherwise it is difficult to 

evaluate such argioments. Finally, this framework provides a context 
I in which the disputes concerning sense data might be solved to the 

satisfaction of everyone. "It should make it easier to gain knowledge 
I concerning specific problems by saving time that might otherwise "be 

wasted in dealing with ill-defined problems o 

I The next problem is the problem of syntberic a priori knowledge. 

I 

^ I am not going to discuss the general question of whether or not there 

I is synthetic a priori knowledge, but limit myself to the particular 

I question of whether or not geometrical knowledge is synthetic a priori . 

The answer to the particular question will influence to a great extent 
I the answer to the more general question, since geometry is one of the 

most likely candidates for the status of synthetic a priori knowledge. 
I Moreover, much of what is said concerning geometry will be applicable 

to other areas of knowledge. Besides limiting the discussion to 
I geometrical knowledge, I will consider from all that has been written by 

philosophers on this question only the view of Kant. His view is the 

Imost important, however, since what he said influenced all the subsequent 
developments. Thus, this isn't really a severe limitation. 

To show that geometrical knowledge is synthetic a priori , one has 
I to establish the two independent claims that it is synthetic and that 

it is a priori . Thus, the discussion naturally breaks down into two 
J parts. At the time Kant wrote, it was generally believed that geometrical 
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knowledge was not syntnetic^ bu^^ that it was a j)riori . It seems to me 

that the claim that geometrical knowledge is syntnetic is less disputable 

than the claim that it is a priori, howeverc Thus, 1 "believe Kant is 

correct in holding this knowledge to be synthetic^ but that what he says { 

about its being a pri ori needs some clarification and modification. It 

is for this latter task that our work is pecularily suited. | 

Kant bf^lieves t.hat it is obvious that mathematical judgments are 
synthetic if one thinks about them. He thinks that previous thinkers | 
had simply overlooked this fact. They were led to believe that 

mathematical judgments were analytic because of the prominent role 7 
deductive inference plays in mathematics. However^ ''This was a great ' 
mistake, for a syntheTical proposition can indeed be established by the 
law of contradiction, but only by presupposing another synthetical 1 
proposition from wnich it follows^ but never by that law alone- "21 Kant 
divides mathematical judgments into two classes, arithmetical and 
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geometrical- He argues that arithmetical judgments are synthetic, and 

then says, '';'ust as little is any principle of geometry analytical. \ 
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Tnat a s?.raighv line is the shortest path between two points is a 
synthetical proposition^ For my concept, a straight line contains 
nothing of quantity, but only of quality. The concept 'shortest' is 
therefore altogether additional, and cannot be obtained by any analysis 
of the concept * straight line.' Here, too, intuition must come to aid 
us. It alone makes the synthesis possible J*22 

J believe Kant is entirely cor^' - t in believing that geometrical 
judgments are synthetic o Cert.ainly, they are not logical truths, and 
they are not t.rue by definition* My own view is that they have exactly { 
the same status as the basic principles of any theoretical science, 

e,go, Newton^ s three laws of motion* Cert.ainly they were discovered 1 
empirically, being based on Egyptian surveying techniques. Granting that 
Kant is correct on this point, there is one point of difference I have 
with him. Kant believes that the r.erm 'synthetic* applies to individual 
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judgments. I hhink Quine is correct in saying that this is not a good 

way to use the term^ but that it should be applied to much larger units 

than individual judgments ftuine says it is the whole of science^ but 

for the present purposes, all I want to maintain is that the term should 

apply to all of a geometry rather than its individual propositions. For 

instance^ given the present state of affairs^ where there is more than 

one geometry^ I think it would be wise to use the term ^ synthetic^ in 

the sentence 

"Euclidean geometry is synthetic/' 

but not in the sentence 

"The proposition that the sum of the taree angles of 
a triangle equal 1800 is synthetic/' 

It seems to me that this point is quite unobjectionable, for Kant and 

subsequent philosophers, even though they speak of individual ju ^ents, 

or sentences, as being synthetic, believe that all geometrical judgments 

go together^ if one is f,ynthetic, then they all are^ and vice versa. 

Thus, I believe that it is simply an unfortunate oversight that Kant 

applies * synthetic* to individual judgments, for there is really nothing 

in his system to lead him to do this. It is unfortunate because it 

focuses one's attention on the wrong thing, and thus is very misleading. 

The same point applies to the term 'a priori ' as well, and it is 

important for my d-*scussion of this term. 

The prevailing attitude at Kant's time was that geometrical judgments 

are a priori . Thus, Kant's main concern is to show that they are synthetic 

for then he will have examples of synthetic a priori knowledge. The 

criterion for deciding if a judgment is a priori is to see if it is 

necessarily true. Thus, speaking of two principles of physics, Kant says 

"Both propositions are not only necessary., and therefore in their origin 

a priori , but also synthetic, "2^ Kant follows Hume in believing that any 

proposition that is known from experience, i.e., empirical knowledge, 

cannot be necessarily true* Kant's main concern is not in showing that 

^^iilard Quine, From a Logical Point of View (New York: Harper & 
Row, 1963)7 p. If 2. 

^^Immanuel Kant, Critique of Pure Reason (brans. Norman Kemp Smith) 
unabridged edition (New York: St. Martin's Press, 1965)> P* 5^* 

25 



1^ 



geometrical judgments are synthetic a priori ^ but in answering the question - 
cf how this ic possible. 25 

The fact that geometrical judgments are synthetic a priori plays an 
important role in Kant's system. If one is interested simply in the ! 
question of whether or not there is a priori knowledge, then the question 
of t/ne status of arithmetic is independent of the question of the status | 
of geometry, and hence it might be thought superfluous tnat Kant mentions 
geometry specifically. This is not true for two reasons. First, for { 
the actual developments in the Critique of Pure Reason;, Kant needs to 
assum^ that both kinds of knowledge are synthetic a priori , for he thinks 'j^ 
we have two types of intuition, inner (time) and outer (space). Moreover, 
arithmetical judgments are supposed to be based on inner intuition, 

geometrical judgments on outer intuition* The connection between inner - 

intuition and arithmetical judgments is very nebiilous indeed, while the 

connection between outer intuition and geometry is completely t 

straightforward. Secondly, it intuitively seems more plausible to believe . 

that arithmetical judgments are analytic than that geometrical judgments | 

are. To convince someone who believes tha^ arithmetical judgments are 

analytic that there is synthetic a priori knowledge depends entirely on / 

convincing him that geometrical judgments enjoy this status. 

I don't believe Kant is entirely correct in believing that geometrical 
judgments are a priori. First, as remarked above, 1 think the term 
should be applied to all of geometry* not to individual judgments. Apart 
from this^ the fact that non-Euclidean geometries have been discovered { 
and used in physics indicates that Kant is wrong. Ar.other indication is 
given by the nineteenth-century debate between Herir^ and Helmholtz. | 
Bering, among others, tried to construct a scientific theory of visual 
perception on the basis of Kant's theory. This theory took the way we *j 
perceive space as being given, rather than learned, and hence Is called 
by Helmholtz the intuition theorj''. Opposed to this was the empirical 
theory, whose chief proponent was Helmholtz, which held that we must 
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J learn to preceive space* Helmholtz was a clear winner in this debate > 

1 for it is very difficult to explain some illusions on the basis of the 

, intuition theory, while the empirical theory explains them nicely. These 

I two developments show that Kant's particular theory is wrong, but they 

don't show that his approach is wrong, i«e*, that a theory similar to 
j Kant's is wrong. I think this latter question is the important one* 

In terms of Figure 1, geometrical knowledge is indicated by the 

[ appropriate outputs of the learning and processing device, M.^^ This 

I 

output depends on the input and the internal structure of M. Since 
[ knowledge consists of the ability to give the appropriate response to 

» any input, geometrical knowledge is a property of the state of M<. M will 

change as learning takes place, but its knowledge at any point in its 
1 experience is a property of M at that point. The structure of M at any 

given point is determined, perhaps only probabilistically, by the original 
structure of M before any learning has taken place, and by the history 
of the inputs M has received. Perhaps in a growing organism it will not 
( be easy to separate the original structure from the history of inputs; 

it is possible that some learning might take place before the processing 
1 device, perhaps area 17 of the cortex, is fully developed. I believe 

t that such a factor is epistemologically irrelevant, and that there is no 

I need to take it into account in the present discussion. Certainly, Kant 

doesn't consider such a factor. 

In this context, the best way to interpret a priori is not as an 
all-or-none predicate, but as a matter of degree* I am siiggesting, in 
other words, that it is best bo treat a priori the same way that Quine 
treats synthetic. Thus, geometry will be more a priori the more the 
state of M, once it has acquired geometrical knowledge, is determined 
I more by the original structure than it is by the history of inputs; it 

will be more a posteriori the more the history of inputs detemLnes the 
j state. Given this interpretation of a priori , Kant's theory is that the 

I state of M is determined entirely by the original structure. Kant can 
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allow for some sort of Uearning' by saying that it requires some 
experience to actualize geometrical knowledge. On my interpretation, 
this amounts to saying that M changes, and that these changes require 
inputs, but that how M will change is determined entirely by the original 
structure. This sort of 'learning' theory is popular with all kinds of 
intuition theories, be they theories concerning morality or causality, 
and is not peculiar to Kant. I personally canH see any justification 
for such theories, but this is largely irrelevant to the present point, 
since, as mentioned above, Kant's theory has quite conclusively been 
refuted. The opposite extreme from Kant's view is that the state of M 
is determined entirely by the history of inputs. This view has been 
stated historically by saying that the mind is a tabiaa rasa . This view 
is completely impossible, as M has to have some structure in order to 
learn from the inputs it receives. The true view is located somewhere 
between these two extremes. If the original structiare actually does 
determine geometry to a great extent, which certainly seems plausible, 
then I believe it is fair to say that Kant was essentially correct, i.e., 
that he had the right approach. I donH know whether or not this is 
the case, but it certainly is a possibility. 

The problem now is to decide what the correct mixture of original 
structure and history of inputs really is. One way would be to alter 
the history of inputs to different devices, and see how different the 
resulting geometries are. Psychological experiments in weird perceptual 
conditions can be regarded as attempts to do this. This is not the 
correct way to approach the problem, in my opinion. Such results will 
be quite fragmentary, unless a person's perceptual conditions are 
completely altered all the time. Otherwise, these results show only 
what happens when a device that has already learned ordinary geometry is 
briefly exposed to differing inputs, and this is a much different situation 
than actually learning geometry entirely under different conditions. 
Moreover, the experiments I am thinking of are like the one with glasses 
that inverted the retinal images, which do not naturally lead a subject 
to different geometrical assumptions. It would be interesting to see 
experiments that could have this outcome. Moreover, these results can 
at best give only a partial answer until an at least approximate 

28 



characterization of the original structure of M is available* It is my 
belief^ once such a characterization is available^ that such experiments 
will not be necessary. My suggestion is that given the characterization^ 
it should be possible to determine what constraints the original structure 
puts on the geometries that could possibly be learned. I think this 
could be done in a way similar to how logicians treat the problem of how 
categorical a set of axioms is. Rovighly speaking^ an original structure 
will be less' categorical the fewer different possible geometries that 
it allows* My proposal is to say that a geometry is more a priori the 
more categorical the original structure that learned it is. 

There is one refinement of this view that seems desirable* Rather 
than saying that all the geometries that a certain device can learn are 
equally a priori ^ it seems plausible to order these geometries according 
to the ease with which they are learned. Thus, the geome cries that are 
learned more easily which are more natural for the device, would be 
regarded as being more a priori than geometries that are difficult to 
learn, that are unnatural, for people, this would result in saying that 
Euclidean geometry is more a priori than non- Euclidean geometries, which 
is intuitively appealing. 

Note that now a priori is not used as an absolute term, but must be 
relativized to a particular learning device. Thus, one has to say ^a 
priori for M,* for example, rather than simply *a priori . ^ This is how 
the term should be used, for it is clear that two devices could have 
exactly the same geometry, and that it would be almost completely a pridri 
for the one and not very a priori at^all for the other. 

This relativized use of the term may seem a little strange, but 
actually, there is a good explanation of why it hasnH been used in this 
way. When the term is used, it normally means 'a priori for people,' 
and it is tacitly assumed that all people closely resemble one another 
in the way they learn geometry. This use of the term is simply a special 
case of the more general use that I advocate, and it seems to me that the 
great concentration of attention on this one case is what led people to 
jverlook the fact that a priori is actually a relative term. 
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This way of looking at the term 'a priori * is very similar to the 
way Chomsky looks at the term * innate.' Innate, as it was used, e.g., in 
Descartes, applied originally to specific ideas, such as the idea of God.^'*^ 
Chomsky changes this use completely, and talks of the innate abilities 
of the "acquisition device," which is the device that learns language. 
Thus, he applies the term to the original structur-^ of the acquisition 
device which is how I have used the term a priori , since it depends on the 
categoricalness of the original structure. TMs should not be too 
surprising, for both * innate* and 'a priori * were used as opposites for 
* empirical* * It seems to me that the only reason for two terms was that 
they were thought to apply to different things. If one agrees with 
Quine, then it is wrong to apply such a term to either ideas or individual 
judgments. Both should be applied to whole theories, so that the 
differences between them collapse, and they become synonymous. 

The first jtwo problems I have considered dealt with learning and 
their solutions depend on a more fully developed theory. The third 
problem I will discuss has neither of these features; it doesn't deal 
with learning, and I believe a solution (at least for the admittedly 
limited context we are working in) does not require any further theoretical 
developments. In fact, I propose such a solution at the end of Chapter 2 
after the necessary preliminary work has been discussed. This problem 
arose in the British empiricists' discussion of abstract ideas. 

Locke, Berkeley and Hume thought that whenever one thought about 
a proposition concerning the abstract idea of a triangle, what one was 
actually doing was considering the image of a particular idea that one 
had in his head. The problem with this was that this particular idea 
had to have definite properties, such as being a definite size, while 
the abstract idea should have no such definite properties. They were 
unable to solve this problem. This is not surprising, for their theory 
was a form of the picture theory discussed in Section 2, and it seems to 
me that there is no way to solve their problem if one accepts the picture 
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theory. The solution I propose rejects the picture theory^ and identifies 
the abstracx idea of triangle with a procedure for recognizing triangles- 
The philosopher who is closest to this conception is Kant."^"^ His notion 
of a schema of a concept is very close to what I have in mind^ as is 
evident from his definition: ''This representation of a universal 
procedure of imagination providing an image for a concept, I entitle 
the schema of this concept. 

These three examples show the utility of working in PC. I now want 
to offer some general considerations that support the same point* The 
first consideration is implicit in what has been said before) namely, 
that the really difficult problems do not arise if one confines oneself 
to OE, I think Wittgenstein is correct in believing ttiat philosophical 
problems are the product of confusion, if one restricts one's attention 
entirely to OE. Moreover, he is also correct in believing that 
philosophers have contributed more to creating these problems than they 
have to solving them. Certainly, the ordinary user of OE sees no 
serious difficulties, and I am also unable to locate them. The problem 
as far as perception is concerned, I think, is that philosophers have 
taken a problem that is essentially scientific in nature, eog., the 
problem of sense- data, and tried to find a solution in OE rather than 
in the scientific context in which the problem arose- This is why the 
issue of sense data, for example, immediately becomes clearer when one 
puts it back in PC. The first step in this philosophical approach is to 
reject the scientific solution to the problem. This is a necessary 
step in order to get the 'philosophical,* as opposed to the scientific. 
Inquiry underway, and philosopners have been aware of this fact- Price 
felt compelled to refute the scientific theory, Helmholtz's causal theory, 
before giving his own analysis. We have already seen the inadequacy of 
his argument against Helmholtz. G- E. Moore is another example of a 
philosopher who felt this step was necessary. Towards the end of his 

^^This was also pointed out to me by Professor Moravcsik. 
^%ant, Critique , p. l82. 
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paper gome Judgment s of Percept ion >-^^ Moore men: ions tvo different ways 
of characterizing t.ne problem of perception' 1 wiii give tnese in the | 
opposite order he gives their^* mentioning the one he takes, which is the 
one that has been widely discub^ed by pnilosophers^ first, since it is a | 
more self-contained statement He says, "'The only other suggestion I can 
make is that there may be some ultimate, not further definable relation ^ 
of ^being a manifestation of,' such thai we might conceivably be judging; 
"There is one and only one thing of which this presented object is a 
manifestation, and that thing is part of the surface of an inkstands 
The flrs^ possibility he mertion^ iSy *'lt might, no doubts be possible 
to define some Kind of causal relations<, such tha? it might be plausibly 
nexd -^hai it and it alone causes mis presented object in x hat particular 
way . Bu" any -iuca definition would ^ so far 1 can iee? be necessarily 
very ccmpiicated T'nis is alA he says on the matter > ThuS; the 

relation between ob.ject and percept can be taken to be a complicated 
causal one or a simpler, unaaalyzable one of some unspecified sort* Moore 
gives no explicit reasons for rejecting the former view^ but what he fays 
implies he does it simply because the view is complicated. This is 
understandable ; since as mentioned above, tne scientific theory Moore is 
thinking of is a theory in FP^ wnich is too complicated to deal with 
effectively. ?nls is no longer ^rae^ since PC is available. Thus, neither 
philosopher has a convincing argument on this pointy, and it seems to me 
that until such an argument is giv^A, there is no reason to believe that 
the philosophical probL?ms of perception are problems of OE. There is good 
reason to believe that, some of tnem are;:* as i have indicated above. 

The belief tnat philosophy should be done in OE and that there is a 
sharp distinction between philosophy and science is a twentieth century 
phenomenon. 1^' arose earlier in this century and is due in large part to 
the great irf luence of Moore. To work before this time, no such sharp 
distinction was drawn. Tjaus, ia the works of pnilosophers like Descartes, 
Locke and Berkeley, and scientists like Helmholt2> one finds no distinction 
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between philosophic questions and scientific questions. More importantly, 
all four treat some questions that a philosopher like Moore would regard 
as scientific next to and in the same way as they treat questions he 
would regard as philosophic. For example, Locke mentions Molyneux*s 
problem concerning a man born blind, and Berkeley discusses this problem, 
Dr» Barrow's problem and why the moon appears larger on the horizon, in 
an Essay Towards a New Theory of Vision , which I think is the most 
interestirg work written by a philosopher in the area of visual perception. 
The philosophical nature of some of Helmholtz's remarks has already been 
discussed. The fact that this earlier work was much more fruitful than 
the twentieth century work is a powerful reason for accepting the earlier 
view. Thus, when things are put in their proper historical perspective, 
it is seen that the position I take in regard to this question is much 
closer to the classic traditional position than is the view that 
philosophy should be done in OE. 1 therefore fee^ that a person who 
holds this latter view is actually under stronger obligation to defend 
his approach than I am to defend mine. I have pointed out above that 
our approach resembles in many ways the approach to linguistics which 
was initiated by Chomsky. It is interesting^ as Chomsky himself points 
out in Cartesian Linguistics , that his approach is a return to ideas that 
were prevalent before this century bat that had been rejected in the 
early part of this century. Moreover, the close analogy between 
perception 9pr} linguistics, whicn has been mentioned several times, itself 
has a long tradition, going back to Berkeley's conception of what is 
perceived as being the language of nature. 

There is one final point concerning the philosophical significance 
of the present work that should be made: it is not necessary that one 
should share my view that the main problems of philosophic interest are 
problems of PC in order to maintain that work such as ours is 
philosophically significant. It seems to me that the view that not all 
philosophical problems are problems of OB is sufficient. Once one accepts 
this view, he is immediately struck by the fact tha^t almost all the 
philosophers working on the problem of perception are working in OE. 
This seems like a misallocation of effort, particularly since there is no 
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really convincing argument that philosophers should confine their 

attention to 0E» It is usually not very fruitful for everyone to adopt i 
the same approach, even if the approach is "basically correct. It is 

worthwhile to have people espousing the opposite view, since this will at f 
least serve to keep those who hold the majority view from lapsing into 
dogmatic slumbers. I hope the present work at least has this minimal | 
effect . 

This concludes the remarks concerning the motivation and background j 
for the work we did. The rest is an account of the specific model we - 
worked on, except for the section at the end of Chapter 2 concerning 
abstract ideas. 
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CHAPTER 2 
CODINGS 

^ As mentioned in Chapter 1, we worked on two specific protiems, the 

f coding problem and learning. He- ,^r, it took us some time tc accomplish 

this division. Even after making the division, we weni: back and forth 
( between the two problems, but for ease of presentation, I am going to 

I treat them separately. The work on each problem will be presented rougnly 

in the order happened^ and the things we tried and found wanting will 
I be included. This is probably the best way of explaining what we finally 

did, and in any ca'^^ it will provide the background and motivation fo"*" 
J our eventual problem. This chapter deals primarily with the coding 

problem^ the next with learning. 

We decided at the outset to restrict our attention to two-dimensional 
straight-line figures. To get things started, we limited ourselves to 
I figures in which at most two lines intersect at each point, figuring it 

t would be wise to try to solve this simpler problem first, ard later on 

I try to remove this somewhat artificial restriction. ('Figure* will 

i henceforth mean a figure of this type.) What we wanted was a device that 

could learn a geometrical predicate applicable to such figures by going 
I through a series of trials, where on each trial It is presented with a 

figure, responds yes or no, and then is told the correct answer (i.e., 
j whether or not the figure presented on that trial did in fact have the 

specified predicate). It would be nice if the device could learn several 
r predicates this way and then use them to learn more complicated predicates, 

but the basic situation is when the device has no geometrical predicates 
at its disposal. Using the eye-brain combination as our model, we came 
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up with a device with three components: scannc:^ memory, and processing 

device, the first corresponding to the eye, the last two to the brain. 

For convenience in discussing the following problems, we lumped the last 

two together and called the result * automaton.* This was due to the 

fact that, in the present context, the internal structure of the memory 

and processing device isnH important, only its input/output, and this 

resembles that of a finite state automat o:t. Intuitively, the automaton 

needs three abilities: on each trial it has to be able to use the 

scanner to acquire information, use this information to get an answer, 

and have the ability to learn from trial to trial so that it will 

eventually get all right answers. 

Keeping this rather vague idea of i:he automaton in mind, the next 

task was to explicitly characterize the scanner it had at its disposal. 

The scanner has to be able to receive and execute instructions from the 

automaton and to report back what it sees. We decided to work with polar 

rather than rectangular coordinates since it is more natural to think of 

the eye moving a certain distance in a specified direction rather than 

moving up a certain distance and then over another distance. The retina 

of the scanner is a small circle with a special point X in the middle, 

which corresponds to the fovea. One of our main concerns in writing it 

out was to allow for perceptual error and indeterminancy, e.g., forming 

only a rough idea of the size of an angle, and it would be fairly easy 

to put these things in the following device: 

1* Await instruction: Search (9,r) go to 2 

Follow (9) go to 6 
Automaton cannot order Follow unless special point X is on line. 

2. Move in direction 9 until: line appears on retina go to k 

distance r is covered go to 5 

5. Report Miss to automaton go to 1 

k. Move so that special point is on the nearest vertex if one 
appears in retina, or on the nearest line if not go to 5 

5. Report distance moved since last report, type of vertex 
special point is on according to pictures and angles lines 
make with horizontal 



go to 



Move special point alon^ line nearest to angle 9 until vertex 
is leachsd go to 5 
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The advantage in writing out tne scanner explicitly is that it makes 
it possible to state presuppositions and distinguish separate problems. 
First, it makes clear the sense of what I said in Chapter 1 about how we 
took straight as primitive. We present the scanner with straight lines 
only and simply give it the ability to follow them. Thus, its output 
is the same as that of an organism that recognizes and follows straight 
lines • Moreover, this makes it clear that the scanner doesn't correspond 
exactly to the eye, since the eye can't follow straight lines by itself. 
Secondly, it is clear that the automaton is going to have to be able to 
recognize when the scanner returns to a vertex that it has already reported 
on. The main reason for having the scanner report all the distances and 
angles was to give the automaton enough information to do this. It now 
occurred to us that it would be wise to abstract from this problem, and 
simply assume that the automaton has the ability to recognize the same 
point every time the scanner is on it without worrying how it accomplishes 
this. This simplifies the automaton, but of even more importance in the 
present context is that it allows one to greatly simplify the information 
that the scanner gives the automaton. Many geometrical predicates, e^g., 
closed, connected, triangle, do not depend on the lengths of the particular 
line segments or sizes of the particular angles involved. We decided to 
study these predicates, and hence dropped the reports of distances and 
angles from the output* of the scanner. Finally, it allows one to formulate 
the scanning problem (how people scan figures) that was mentioned in 
Chapter 1. In this context, it simply becomes the question of how the 
automaton decides what instruction to give the scanner each time it reports. 
We were originally very interested in this problem for several reasons. 
It is not clear what a good method of scanning would be, or how people do 
it, let alone how people learn to do it. Moreover, it seems clear that 
the method would vary according to what is being looked for and what has 
already been found: e»g», one would look differently if one wanted to 
know whether a figure contained a triangle than if one were interested in 
whether it was connected, and if one were interested only in whether the 
figure contained a triangle, he would stop scanning it after he found one* 
Moreover, it seemed plausible to ^lieve, in our particular case, that a 
good method of scanning would simplify the learning* For instance, it would 
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be easy to recognize polygons if the scanner went around the perimeters 
of polygons. We failed completely to make any headway in our attempt to 
solve the scanning problem (about the only thing we could agree on was 
that it would be good to use Follow rather than Search wherever possible)^ 
and a little reflection convinced us that it was mistaken to tackle it in 
the first place; it became apparent that as long as the whole figure was 
scanned and the appropriate informaTion stored^ it made lit-*-le difference 
how this information was obtained as far as processing it was concerned. 
Thus^ it seemed prudent to keep the problems of how the automaton used the 
scanner to acquire information and hov it processed this information 
completely distinct^ and concentrate our efforts on only one of them. 
From consideration of examples like those mentioned above, we concluded 
that processing the information was the first problem to be solved, since 
efficient scannir^ depends on knowing what to look for and hence having 
some geometrical predicates already at hand. Besides, processing the 
information seemed like the more interesting problem, and ignoring how it 
was acquired allowed us to forget about the scanner altogether. We now 
simply regarded all the information^ excluding distances and angles, which 
could be gotten from the scanner as simply given to the automaton. The 
problem now was to decide in what form this information should be given, 
i.e., how it should be encoded. 

The folloving suggestion and elementary results are due to George Huff. 
Before giving the formal definition, let me first give an example « Take 
the figure ^t^^^^K^^ 'i'he idea is to label each vertex with capital f 

letters, say ^ ^^^^\\^ ^ ^ element in the coding for each line, 

i.e., {AG, GCH, A&JDE, BF, HD) would be a coding for the figure* One ■* 
could substitute GA for AG and still have a coding, but it is important 
to have only one element of tne coding for each line in the figure. To 
exclude the possibility of putting two elements for each line into the 
coding is the reason for ordering the vertices in the original figure. 
Any other method of achieving this would also be satisfactory. Morover, 
it would also be possible, in the above example, to label the vertices J 
a different way, as long as one uses A-K, and to get different codings this 
way. . All such oddings would ue equivalent, as the following theorem shows. 
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Suppose X is a figure with vertices ^1*^2* **^n ^^P-^^ o^^^^^^ 
some way. 

Definition: ZlS a coding for a figure X if is a set, and there 
is a 1-1 function C^^ which maps the vertices of X onto an initial 
segment of the Roman alphabet of capital letters (with its usual 
order, subscripts if necessary) such that Z = {c(v^ )C(v^ )..*C(v^ ): 



m 



V. ,v. ...V. denotes a line in the figure X with endpoints such 
12 m 

that V. precedes v. in the ordering of the vertices and central 

"^1 "^m 
vertices v. ♦.♦v. in that order from v. }. 

^2 Vl '1 

Definition: iff is a coding for X. 

Definition: 5; s if f there is a figure X such that i^RX. 

Note that "s" is reflexive and symmetric. 

Lemma ; Z = Y iff there is a permutation p of the vertices of X such 

that P5; = V , where vZ = {p(A )..*p(A ): A .••A € , 

^1 m 1 m 

Proof: ^ Denote the vertices of 3; by A^**.A^, those of |f by 

{A .**A } = {B, .**B Suppose there is a figure X such that ^ &t^RX* 

Define p by p(A^)=Bj iff there is a vertex of X such that C3^v)==A^ & C^v)=By 

Clearly p is a permutation of A.***A * Suppose p(AjL ^***P(A^ ) € p%* 

^ m 

Then A ♦♦♦A = C^(v. )***C (v ) e X, so v. *.*v is a line in X* 
^1 m ^ ^1 ^ m 1 m 

Therefore C|/(v, )***CfAv. } = B, .B, € * But p(A )*.*p(A ) = 
f h ^ m h ^ H m 

B ...B . Hence, PSTCV- Similarly for J^fi pJC* 
''l ^'m 

<= Suppose there is such a permutation p* Let X be such that JfiRX* 

Want to show ^ = p5; is such that y-RX* Define C^tas follows: C^v)=:p(C^v) ) . 

Let^^ = (C^v^ )--CJv^ ): v^ ...v^ is a line in X) . Show^ 
'1 ' m 1 m 
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If C^(v )...C^(v ; € 1L , then v. . . .v is a line in X, so -| 
/ 1 IT m ' 1 m I 

Cjv. )...Cjv ) € %, hence p(Cjv ))...p(C (v )) =C^(v )...cJy.') c v%^f 

Thus f^s, p Similarly f^%. ^ 

Corollary : is an equivalence relation. j 

Proof: Need to show transitivity. If %BfScf =^ then there are ■ 
permutations p^^Pg such that V^%^^ Pg^""^ Thus, PgP^^tr^ Q*E.D* j 
It is now possible to define a mapping S from the set of figures into 

the equivalence classes of codings by Six)=[fc]f where K is such that j 

5iPX. S induces an equivalence relation on the set of figures defined 

by X=Y iff S(x)=S(y), i*e.^ X^y iff any pair of codings for X and Y are 

equivalent. What these equivalence classes look like and what group of S 

transformations the above equivalence relation remains invariant under 

we do not know. Certainly the group of transformations is not one of the 

ordinarily geometrically significant groups. 

We decided to take this coding as our final formulation and vork | 
with it. At this pointy we spent some time working on learning, the 

results of which are in Chapter 3, before returning to the coding. . 

The next question we asked was what would be a good way to recognize 
geometrical predicates given only the coding. The answer would determine -j 
to a large extent what we wanted oui^ learning device to look like at 
asymptote, and hence this is a crucial, ini, as it turns out, quite 

interesting, question* We concentrated on the predicate 'triangle,' or \ 

'triangle in context,' which is true if and only if the figure contains a 

triangle. Given a figure that consisted only of a triangle, the three | 

vertices will be labeled A, B, and C. The natural thing to do in this 

case would be simply to teach the device to recognize the pattern AB,BC,CA* j 

This could be done by making the coding the input tape for a fairly simple 

finite state automaton, which was highly desirable because we had been - 1 / 

dealing with finite state automata in our work on learning theory. y) 

There is a problem even in this simple case, however. It is possible 
that the figure is coded AB,BC,AC, or in some similar way. It occurred 
to us that a good way to avoid this problem would be to put the coding 
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into some canonical form, but there is no natural way to do this for 

even moderately complex figures and our efforts produced no way at all* 

Thus, the device has to be able to recognize when different codings are 
I equivalent, and so already it is no longer a simple pattern- recognizing 

device* Moreover, in order to deal with figures containing central 
I vertices, the device needs the ability to break a line with one or more 

^ central vertices in^o all its possible segments. For example, in the 

I coding {ABCD,AE,CE,BF) it would have to find the segment AC before it 

' could find the triangle ACE* This example also illustrates the point 

I that in complex figures the vertices of a triangle could be labeled by 

I any three letters, so that the device has to be able to recognize the 

same pattern regardless of the actual letters. Finally, it is clear that 

if one simply gives the coding to a finite state automaton as its input 

tape, the automata will have to have more states the more complicated the 
j figure becomes. Thus, the automaton will have to grow as the figures do, 

and hence it will be impossible to have an automaton with any fixed 
I number of states that will be able to recognize triangles in all contexts. 

Hence, automata have mxch the same deficiency as perceptrons* Thus, this 
I natural approach has many difficulties and its main attraction, that it 

i connects naturally to the work we did in learning theory by way of finite 

state automata, turns out to in fact lead to the very difficulty we set 
I out to solve, which is that no perceptrons of fixed complexity can recognize 

predicates such as ^ connected.^ 
; In the above example, it is clear that BF and CD are not on any 

triangle merely from the fact that F and D occur in only one line. We 
j called such segments 'legs,^ and noted that in cases besides the present 

' one legs are irrelevant in the sense that they can simply be deleted 

I without changing the value of the predicate. Thus, we thought that the 

^ device should have the ability to delete these legs. We were still at 

J this point thinking of the device as sort of running through all possible 

1 combinations of three line segments looking for triangles, and allowing it 

to ignore some line segments would result in a great saving of effort. 

This rather simple idea led to a rather fundamental change in our thinking: 
instead of thinking of the device as a finite state automaton and worrying 
I how it could learn to make the correct transitions, we thought of the 
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device as being able to perform simple operations and learning consisted 
in combining these simple operations into more complex operations that 
would be able to recognize the appropriate predicates. 

At this point, we came up with a much better way of recognizing 
triangles, which confirmed this change in our thinking. In final form, 
this method consisted of picking a line in the. figure, taking the set of 
lines that crossed this line, and checking to see if any vertex occurred 
on two of these lines: if one did^ there was a triangle, if not, the 
original line could be deleted and the process repeated to see if there 
were any triangles in the whole figure. Moreover, if a vertex is found 
that does occur in two lines in the set, it is easy to find the other 
two vertices of the triangle^ and hence if the device has an output 
mechanism,. -it can in fact list ail the triangles in the figure. This 
method indicated to us that it would be fruitful to think of all the basic 
operations as being set-theoretical in nature, since what is required in 
the above method is only the ability to form sets and make deletions. 

We next discussed the types of simple operations in general terms. 
I will give the final result of this discussion, although it didn^t come 
out until later* Originally we had in mind three types of operation: 
set operations, deletions (erasing line segments), and constructions 
(adding line segments). The first two can be done entirely within the 
coding, i.e*, it is not necessary to go back to the figure from which 
the coding was obtained to do these operations. This is not true of the 
constructions, for one can*t tell from the coding for a figure if a line 
segment that is added between points on two lines will intersect other 
lines in the figure or not. A machine that can make constructions is 
more powerful than one that can*t| in particular, there are two obvious 
things it can do that a machine without this ability couldn't: it can 
recognize whether a polygon is convex or concave (by connecting all its 
vertices and seeing whether or not all the added lines intersect) and the 
ins ide/out side of a polygon (for a convex polygon it is possible to 
draw lines to each of the sides withoixt intersecting the polygon from a 
point on a line segment if and only if that segment is inside, and 
combining this with the first construction takes care of the concave case#) 
A machine without this ability can't do this since the figures in each 
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of the following pairs have the same^ i.e.? equivalent codings: 




We decided to ignore the 



construction operations, which has the effect of ignoring the figures 
and seeing what can be done with the coding alone ^ We next discussed 
whether we should have generalized or parameterized deletion operations. 
To take the legs example, a generalized deletion operation would simply 
delete all legs at once, and repeat this until all legs were removed, 
while a parameterized opera^ion would remove one leg at a time, the 
particular leg being removed having zo be specified by its endpoints 
(the parameters)- We chose the parameterized version since it is more 
powerful (it can do things the general operations canH), simpler 
(everything can be done with one operation), and less arbitrary (just 
which general operations to allow would be to a certain extent an 
arbitrary choice)* Furtheirojre, this type of deletion allows for more 
learning, since less is built into the machine to start with, which i^ 
good since it is intuitively the more natural approach, but bad since the 
learning is complicated. The choice was confirmed when we discovered 
that the parameterized deletion rules could be easily formulated using 
the set operations. 

The above work was a Joint effort of the three of us^ Now definite 
problems had been defined and a definite framework for solving them set 
up. The rest of this chapter is concerned with my attempt to solve these 
problems and is my independent contribution to this problem. 

The problem now was to see how much could be done within the framework 
we had agreed on-» A formally nice approach would be to write down the 
basic operations and the ways they could be combined to get more complicated 
roucines and then prove that a device that could do these things could 
recognize in a reasonable way the predicates that it intuitively ought 
to be able to* Such an approach proved to be unfruitful, however. First 
of all, a coding is essentially a set of n-tuples, so that any set- 
theoretical operation that can be performed on a set of n-tuples can be 
used on a codii^g. Specifying particular operations adds very little to 
an understanding; of the present problem • Secondly, the selection of 
particular opetdtions is primarily a question for the learning theory; 
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operations will be chosen not for their mathematical elegance, bwt because 
they facilitate the learning of geometrical predicates. Moreover, to 
choose them intelligently requires that one knows what routines are 
needed to recognize the desired predicates, and this was lacking at the 
time. Finally, allowing all set-theoretical operations is adequate for 
determining which predicates can be recognized from the coding, for it 
is obvious that whatever can be recognized can be recognized using them* 
This is an interesting problem, and solving the other problems satisfactorily || 
depends on its solution. For these reasons, I allowed my^'elf to use any 
set-theoretical operations which seemed useful in dealing with codings. 

The first thing to notice is that since we have deletion rules, the 
definition of a coding requires a slight alteration; it is no longer 
desirable to require that the coding be an initial segment of the alphabet. 
The reason is that even though the coding of the original figure satisfies 
this requirement, figures obtained from it by deleting segments don't 
necessarily satisfy it, since it is possible to delete all the segments 
containing a certain vertex and still have vertices with labels from 
later in the alphabet left* Relajting this requirement doesn't effect 
Huff's :-esults, except that p cannot be taken to be a permutation, but 
instead, must just be a 1-1, onto map from one subset of the labels for 
vertices to another. For notational convenience, I am going to restrict 
the set of labels for vertices to capitals from A-H (with subscripts), and 
call this set T. 

One can regard a coding as either a set of words of I or as a set 
of n-tuples of elements of I. The first method is more natural when one ^ 
is dealing with automata, but the second way is more natural in the J 
present context, since the operations are set operations. Thus, what I 
will now call a coding is the set of n-tuples obtained in the natural t 
way frcxn the original coding. I will write these n-tuples the same way 
as they were written in the original coding, e.g., ABC. 

Definition: An n-tuple b of elements of I is a line if f n > 2 

and no element of I occurs more than once in b. 'I 
Thus, every coding for a figure is a set of lines, but the converse .f 
is false. A trivial way a set of lines could fail to be a coding for a 
figure would be to have a vertex occur on only one line in the set, and | 
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on that line as a central vertex, e.g., (ABC] is not a coding for a 
figure. I shall call a set of lines with no such vertices a good set 
of lines. This example shows that a subset of a coding is not necessarily 
a coding. There ure, in fact, good sets which arenH codings, but I will 
return to this quest->ion later. I use small letters b-h as both names and 
variables for lines, U to denote a coding and V to denote an arbitrary 
set of lines. GJhus, operations that can be performed on V can also be 
performed on U, and thus definitions that apply to V apply also to U. 

There are a couple of obvious things that apply to any coding. 
Given any line in a coding, it is easy to tell the labels for its endpoints 
from the labels for its central vertices, since the former are the first 
and last elements of the n-tuple, while the latter are the remaining 
elements. It is also easy to determine the number of lines a vertex is 
on; simply count how many times it occurs in the coding. A formally 
better way would be to form the set of all lines on which it occurs, and 
take the cardinality of the set. I now want to restrict my attention to 
codings for figures that have at most two lines intersecting at each 
vertex, i.e., in which no vertex in the coding occurs on more than two 
lines. Thus, each vertex can be classified into one of four categories, 
depending on how many lines it occurs on and whether or not it is an end 
or central vertex on these lines. The four categories are single ^end 
(occurs on one line), double-end (occurs as endpoin^ on two lines), 
double - central (occurs as central point on two lines), and end-central 
(occurs as endpoint on one line and central point on another). Notice 
that the remark in the preceding paragraph amounts to saying, in this 
terminology, that single central vertices cannot occur in a coding for 
a figure. These four categories are exactly the categories (see p. 36) 
a,b,d and c, respectively, that the original scanner sent to the 
automaton. Thus, presenting the automaton with a coding is in fact 
equivalent to presenting it with the information that it could get from 
the scanner, minus the distances and angles, which is what we wanted to 
do. For convenience, I will henceforth say "A is in V** instead of '*A 
is on a line in V," and use the phrase "remove A from to denote the 
operation of replacing all n-tuples of the form qAr by the n-tuples qr. 
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The parameterized deletion operation can now be stated precisely. 
This operation on the coding corresponds to erasing a simple line segment, 
one with no central vertices, on a figiire for which it is a coding. 

Definition: AB is a simple line segment in V if there is a 

b 6 V such that b = qABr, where q,r are n-tuples of elements 

of I, n > 0. 

Intuitively, there are three cases: q,r both empty (AB is the whole 
line), only one empty (AB is the last segment on a longer line), and both 
non-empty (AB is in the middle of a longer line). Originally, I wrote 
four operations to cover these three cases (two rules for the second case), 
but the following rule covers all the cases. 

Deletion Operation; To delete simple line segment AB from 
B, form the set V* by replacing qABr with the two elements 
qA,Br; form V" by deleting all 1-tuples from ; remove all 
single central vertices from V". 
V is not necessarily a set of lines. V" is a set of lines, but not 
necessarily a good set, but the final result is a good set. More 
importantly, assuming that one started with a coding, the final result 
is a coding. Indeed, the result is a coding for the figure obtained by 
erasing the segment AB in any figure which U is a coding fcr. The , 
converse is not true; it is possible to obtain a coding by deleting a 
segment from a good set of lines that is not a coding, as I will show 
when I take up this question later. 

This parameterized deletion rule can do whatever any generalized 
deletion operation could do. A generalized operation deletes all simple 
segments of a certain type. Obviously, some restriction on the 
classification of simple segments is necessary to make these operations 
meaningfoil basic operations, e.g., one wouldn*t want a basic operation 
that said delete all segments on a hexagon. The natural restriction to 
place on the classification is that it can depend only on the configurations 
(see p. U5) at each endpoint. Since it is possible to recognize the four 
types of vertices from the coding, it is possible to recognize the 16 
types of simple segment. Hence, any of the 16 possible generalized 
deletion operations can be performed by deleting all segments of a certain 
type one by one. 
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J. ' I now want to state formally the informal method 6f recognizing 

! triangles that was mentioned above. By saying that there is an effective 

procedure for recognizing triangles from the coding I mean that there is 
I a way of actually listing the three vertices of each triangle. This is 

a slight departure from the learning set-up we originally envisioned, 
I which consisted simply of yes and no answers, but there are three reasons 

^ for it. First, only slight additions are needed to a procedure that can 

j answer yes or no correctly to the question "Does U contain a triangle?" 

to get a procedure that can list the vertices of each triangle. Secondly, 
s it is possible to ask a question like ''Is zhere a point which is the 

i vertex of 7 triangles?" that is answered most naturally by listing the 

triangles. Finally, as is obvious in the case of connectedness, being 
i able to list the information about simple predica-.es is a big help in 

being able to give yes and no answers about mors complicated predicates. 
The actual names of the vertices that the device uses are internal to 
' it, but it could identify the vertices by location so that it would be 

i possible to directly check to see if it was actually recognizing the 

predicate correctly. Otherwise, this could be checked indirectly by 
I asking questions like the one involving seven triangles. 

Theorem 1. T here is an effective procedure for listing all 
j the triangles in a figure from the coding for the figure, 

i Proof. Pick a line b from U. Form the set U-{bl, and take all the 

lines in this set which have a vertex in common with b, gevUng a set W. 
I &/ery vertex A which occurs on two lines in W is the vertex of some 

triangle. The other two vertices are the vertices which the lines A is 
I on have in common with b. All triangles which have a segment of b for 

' a side are found by this procedure, so now it is possible to delete b 

and repeat the procedure, and thus get all the -traingles in U. This 
' process terminates since U contain,', only finitely many lines. Q.E.D. 

This proof is given for the case where at most two, lines meet in 
1 a point, but it generalizes easily to the general case (where any number 

of lines can meet at a point). Simply form zhe set of lines crossing a 
( given line as above, and for each verte:. which occurs more than twice in 

this set each pair of lines which meet at -.his vertex are the Jides of a 
j triangle. The next theorem is important for two reasons: it plays a 
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crucial role in later work, and it was this predicate that perceptrons 

failed on* For these reasons I give a detailed proof ♦ First > the 

required definition. 

Definition: A coding is connected if there is a sequence 

<b ,b, . . .b > for every pair of lines c^d such that b =c, 
o 1 . n 0 

b =d and for all i, l<i<:i; b. has a vertex in common with 
n 1 

b . T and b , _ . 
1-1 i-t-1 

Since the figures we are dealing with contain only straight lines, a 
figure is connected if and only if the coding for the figure is 
connected. 

Theorem 2. There is an effective procedure for listing the 
components of a coding U, am hence for recognizing connectedness . 
Proof. Pick a line b in Uo Define the sets M^,, recursively as 
follows? M^={b}, N^=U-M^. M^^^={c€N^: c has a vertex in common with 
a line in M^J, '^i+i'^'^^i'^^i^'l' These sets can be found in an effective 
way from the coding. The are a decreasing sequence, i*e., for all 
i, N^^j^^ N^* Since U has only finitely maro^ elements, there is some 

number m such that K . Lei, p be the least such number. I claim 

m^rx m 

that U-N is a component of To establish this I must show that 
P 

U-N is connected and that no line not in U-N^ is connected to a line 

P P 
in U-Np. ?his requires three simple lemmas. 

Lemma 1. For ail n. U-N = .U M . Proof is by induction on n^ but I 

n i<n 1 

omit it. 

Lemma 2. .ti M. 13 connected. Proof by induction on n. 
i< n 1 

a, n=l. 4^^£=f^5> and is connected. 

b. Suppose j^^M^ is connected. Show j^^U^j^^^ is connected. 

I will show that any two liaes in M^^^^ are connected^ which is the 

hardest case. If b^,b are in M then they each cross at least one 

0 m n+i 

line in M , say b^^ b^^^, respectively. By assumption there is a 

sequence <b^*..n ^> which connects K to b_ ^ . Thus, the sequence 
1 m- i -L m- X 

<br>>b, ♦♦•b , ,b > connects to b . 
0 1 m-i m 0 m 



i i i<p i 



M =0 since N ,,^N -M^^ =N^. But then M .^=0, 

p-ri pfl p p+1 p ip+d 



Lemma 5* 

since no line crosses an element of the empty set* Thus, for all k>l. 



p+k 



1^8 



1 



1 



That U-N is connected follows from lemmas 1 & 2» Suppose that a line c 
P 

is connected to a line in U-N . Then it is connected to b, so there is 

P 

a sequence b^...b of lines that cross, where c=b • Hence, c€ tl M. , and 
^ 0 m m ' i' 

thus, by lemmas 1 & 3> c eU-N^ . This process can be repeated 

P 

on Np, and then again, until all components of U are found. U is connected 
if and only if there are 0 or 1 components. Q.E.D. 

Notice that this procedure makes no use of the fact that only two 
lines intersect in a point, and hence is good for the general case. 
Also, given this theorem, it is easy to tell if a line sequent AB is on 
a polygon in U, since AB is on a polygon if and only if the component 
of U containing AB is non-empty and connected after AB is deleted. 
Moreover, as will be useful later, it really makes no difference whether 
or not U is a coding, but the same definition of connectedness and the 
same procedure will work for an arbitrary set of lines. Incidentally, 
this theorem shows that our particular approach and the perceptron 
approach are incomparable, i.e., that neither can do everything that the 
other can. Our approach can recognize connectedness, while the perceptron 
approach canH; but the latter can recognize the predicate * rectangle,* 
while our approach can't. 

I tried to extend the procedure for recognizing triangles to a method 
for recognizing all types of polygons by adding? line© to a given line as 
in Theorem 2. This makes it necessary to treat ^^lygons with an even 
number of sides separately from those with an odd number. To get 
quadrilaterals, for instance, one woul:' take all the lines that cross two 
lines in the set of lines that cross th'. original line b. One would get 
pentagons by taking all lines, except b, that cross a line in this set, 
and seeing if a vertex occurs twice in it. This idea d oesnH work out 
since for quadrilaterals the following case would appear to 

be a quadrilateral. The following pentagon ^^^'^^^ would not get 

recognized since four of its sides are added at once. Things get worse 
as the number of side*, increases, so this method proved to be infeasible. 
I then thought maybe it would be possible to break the figure down 



into simple regions, i.e., in /t\ a \ j \ 1, 2 & 5 are simple regions, 
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and combine these to get all the polygons. This attempt led to the 
discovery that It la impossible to recognize simple regions from the 
coding, since, for example, these two figures have the same coding but 

At this point I 



different simple regions: 



discovered a simple example of a good set of lines^hat isn't a coding 
(ABC,ADE,BE,DC],. Intuitively, this would be the coding for the figure 




with F omitted. This example also shows how a good set 

of lines can be converted into a coding by deletion, since (ABC,AE,BE) 
would be the result of deleting DC, and it is a coding. Characterizing 
necessary and sufficient conditions for a good set to be a coding is a 
very natural problem that turns out to be quite difficult. 

It is obvious that a coding contains enough information to enable 
one to list all the polygons in a figure for which it is a coding. Since 
it is impossible to recognize concave/convex, inside/outside, and simple 
regions it seems that this is about all one can hope for, and so it seems 
to be a good test of the adequacy of any learning device to see if it 
could learn to recognize all the polygons. Before trying to build a 
device that could learn to do this, however, it would be nice to know 
how to do it for oneself, and thus know what sort of things the device is 
going to have to be able to learn. 

After several fruitless attempts, I finally came up with a method 



for breaking a figure into simpler figures. Take the figure 

and look at vertex A. It occurred to me that it would be possible to 
replace this figure by three figures, in each of which a different simple 

line segment containing A had been deleted, i.e., by ^^^" CV 

Now each polygon in the original figure is in one of the new figures, 
and each can be gotten easily by deleting legs. In a more complicated 
case, a polygon might be in more than one of the resulting figures, but 
this duplication presents no problem. Notice that this breaking a 
'?igure into several simpler figures allows for the possibility of 
parallel computation, which is desirable. Moreover, it seems that this 
method is fairly close to the method people would u^e in solving this 
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problem, and certainly it is much more intuitively desirable than the 
first method I tried. 

I finally realized that the restriction to two lines meeting at a 
point was unnecessary, since this procedure, like the triangle and 
connectedness procedures, does not depend on the restriction. For example. 



Thus, dealing with the general case is really not much more difficult, 
and in a way it is easier, since it led me to concentrate on general 
features rather than on ad hoc devices for the special case where at most 
two lines meet at a point. I henceforth dealt entirely with the general 
case, and this led to a surprisingly simple procedure for recognizing 
polygons . 

Before stating the theorem, it is necessary to introduce some 
notation and definitions. 

Definition: AB is a segment in U if there is a line 
b = aArBt, where q,r and t may be either empty or non- 
empty. 

Definition: P is a polygon in U if P is connected and 
P = {segments: each vertex in P occurs in P twice and 
only twice} . 

If U is a coding aad P is a polygon in U, then the segments labeled by 

elements of P form the perimeter of a polygon in every figure for which 

U is a coding. 

Definition: A broken line is a set of simple segments 
in which two vertices occur once and the rest occur twice. 
Definition: A broken line is a leg if it is such that no 
segment on it is on the perimeter of any polygon or part 
of a broken line between two polygons. 

The legs are the stray lines attached to one polygon. Thus, the terminology 

is appropriate. It is now possible to state formally the * delete legs 

operation* I mentioned informally. 

Delete Legs Operation: Delete all simple segments in the 
coding that have an endpoint that is on only one line. 
Repeat until all such segments are removed. 

This operation can be effectively performed given only the coding for a 

figure, and it deletes all and only lege in the figure. 



it is easy to break a figure like 
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Definition: For each vertex A in is the set of all 

simple segments containing A. 

Definition: U-A^ is the set of Ijnps obtained from U by 
perfoarming the deletion operation on each element of A*, 
except that single central vertices are not removed. 
Definition: A is a breaking point of U if U-A^ is 
disconnected. 

Notice that U-A* is not necessarily a coding, or even a good set of lines. 

However, it is simpler to leave the extra vertices in, and as remarked 

after Theorem 2, connectedness applies to any set of lines. 

Definition: If A is a breaking point of U and U . .»U 

j_ n 

are the components of U-A**^', then each set 
uA s L/ {FA € A-^: F € U^l is called a component of A. 
Thus, all the components of A are codings since each UA can be obtained 
from U by deleting all segments of A* that don't have an endpoint in U^^, and 
then deleting the components of the resulting figure that don't contain A. 
For notational convenience, let S be the cordinality of A, and given any 
figure U lex; KL = {A in U: A^>5j. Also, for each component U. of U-A^, 



TTieorem 5- There is an effective procedure for listing all the 
polygons in a figure U' . 

Proof. I claim the following procedure will work. Continue this 
procedure until no new codings are formed, and at this point C is 
a list of all the polygons in U' . 
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bresJc Into components 

U-»».U ; each U. is a new coding 
In 1 



for each new coding delete legs 
to get U 



K^=:0]> jlist U in"cl 



Wo 



pick AcKy 



I 



break U-A* into components 
U^-.-U^, take each separately 




U^U is a 
new coding 



:ake each pair F, G in W 



U^U {FA,GA) with single 

central vertices 
removed is a new coding 




Uj^UtFG} with single central vertices 
removed is a new coding 



In talking about this procedure it is convenient to think of things 
happening in stages. The arrow in the txow chart makes the beginning 
of a new stage ♦ Stage 0 is before the arrow is reached the first time, 
stage 1 between the first and second times, etc. At each stage all the 
new codings from the previous stage are piocessed simultaneously* The 
proof that the procedure works depends on proving three lemmas about 
what happens at each stage. 
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Lemma 1. If one begins at the arrow with a coding U and vertex A, 
then everything in the flow chart that is called a new coding is a 
coding, and is in fact connected. 

Proof. Every X claimed to be a new coding is of the form union some 
subset Y of v.. If Y=V^, X^Uj and is a coding. If YCV., X can be 
obtained by removing elements in W^-Y from uA. This is so since was 
gotten originally oy deleting A* from U, but not erasing the single 
central vertices as required. Thus, U^U {FGl or U^U{FA,GA} is what is 
obtained from by performing the deletion operation on the remaining 
elements of without erasing the single central vertices. When these 
are removed the result is the same as with -(F/,GA) deleted, and 
hence is a coding since it can be obtained from a coding by using the 
deletion operation. Moreover, X is connected since it is a component 
of U-A^ union one or two segments that have one endpoint on a line in 

1 

Lemma 2. Each element X listed in C is a polygon. 

Proof. That each element of C is a connec^ed coding with no legs follows 
from Lemma 1, the fact that components of U* are codings and because all 
legs are deleted right before an element is put in C. Thus, no vertex 
occurs on only one line in X* But no vertex occurs on 3 or more simple 
segments in X since Kjf=0 means that there are no vertices A such that 
J[^>3. Thus, there are no central vertices in X, because if there were 
it would have to be a single central vertex because a central vertex is 
on 2 simple segments in every line on which it is a central vertex, 
which contradicts the fact that X is a corlirig. Thus, each vertex occurs 
on exactly 2 segments. Thus, X is a sei of segments and each vertex 
occurs exactly twice in X, hence X is a polygon. 

Lemma 5* For each coding U at the beginning of stage n, P is a polygon 
in U if end only if P is put in C at stage n or there is a new coding X 
formed at stage n such that ? ;S a polygon in X. 

Proof. Suppose P is a polyjon in U. U has no legs and hence it is 
either a polygon and goes in C, or it has a vertex A such that ^>3* 
There are two possibilities. 
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i) A is a breaking point of U. In this case P is a polygon in uf for 
some i* Parts of P cannot be in two different components of since if 
they were these components would not be disconnected when A* is deleted, 
and hence would actually be only one component. There are two 
possibilities: no segment in A* is on P^ in which case P is in each of the 
new codings formed from U^, oi two segments in A* are on P, in which case 
P is on the new coding formed when this pair of segments is chosen* 
ii) A is not a breaking point of U. Then there is only one component, 
which implies that P must be in one of the new codings by the above 
argioment . 

If P is put in C, then P=U and hence P is a polygon in U. If P is 
a polygon in some as it is if it is in a new coding X, then it is a 
polygon in U. 

By x-epeated application of Lemma 5> it follows that the set of polygons 
in some new coding formed at stage n union the set of polygons put in C 
on or before stage n equals the set of polygons in U* . The only thing 
left to show is that at some stage m no polygons are in new codings 
formed at stage m, i.e-, at some stage m no new figures ^ire formed. 
This follows since for each new figure formed from U at stage n 
^<j^, since AeKu but A^.K\j^. So if K^^N, the process will terminate 
on'^or before stage n* Indeed, the routine is set up so that this will 
happen. Thus, at the first stage no new codings are formed, all the 
polygons in U* are in C. Q.E.D. 

There are several comments I would like to make about this procedure. 
The memory requirement is much greater than it was for the connectedness 
procedure. It is now necessary to store the original coding, the new 
codings and the list of polygons. The polygons could be put in the output 
as they are formed, but because of duplication it is still necessary that 
the device knows what has already been printed. The memory requirement 
for the nc'vc codings could be reduced by processing one new coding at a 
time, but thj^*. ^could greatly increase the computation time. It would be 
very desirabl* to eliminate the duplication that occurs in this procedure 
to save tirrc and cut down the memory requirement. It seems to me that 
the resulting procedure would be very close to what people actually do, 
but I was unable to come up with a good way of achieving this economy. 
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Also, the idea of an irrevocable deletion has been lost. Now the coding 
is divided into paints and different things are done to the parts and then 
they are recombined. This is a more powerful procedure, and it is 
intuitively plausible since it is possible to ignore part of a figure 
and analyze the rest of it, and then analyze a different part if this 
doesn*t lead to satisfactory results. An interesting problem concerning 
this procedure was suggested by Professor Jaakko Hintikka: In the 
procedure given, the vertex A around which the coding is decomposed is 
chosen at random. It seems reasonable that certain strategies for 
choosing A would leadto a more efficient procedure than random selection. 
In particular, it seems as if it might be vise to choose A so that it is 
on the greatest number of simple segments. I have no concrete results on 
this problem, however. 

I next tried to solve the representation problem, i.e., find 
necessary and sufficient conditions for a good set of lines to be a 
coding. The attempt has led to many interesting results, but as of now 
it has not produced tne desired theorem. I will now cover some of the 
work I did for three reasons: many of the results are of independent 
interest, listing some of them will serve to indicate the coiuplexity 
of the problem and perhaps be of use to others who might be interested 
in this problem. 

The first thing to notice is that the size of the figure makes no 
difference since the figure can be expanded or shrunken without changing 
the coding. Also, there is no problem in constructing a figure that 
contains no polygons, for one can just start anywhere and draw the lines 
and run into no problem of lines that arenH supposed to intersect 
intersecting. Some of the lines may get pretty small, but this is of 
no theoretical significance. Thus, it would be nice to have a list of 
all the polygons for any good set of lines to aid in determining if it 
is in fact a coding. The procedure of* Theorem 3 will produce such a list, 
however, since on closer examination of Theorem 5 i"t is clear that it is 
not essential that U* or all the new codings are in fact codings* It is 
also true that legs can always be added to any figure, even *legs* like 



1 

1 



I, for the square can be shrunken arbitrarily small and so this 
case is really no different from the case of an ordinary leg. ] 
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Tt «eems to me that the best approach to this problem is to take the 
criing apart at all its breaking points and try to draw a figure for each 
component separately, and then try to fit them together to get the 
desired figure. There are two difficulties that could arise: the 
breaking points could be inside a polygon in both components, e*g*, if 



in the coding for a figure containing 




there were a 



line between A and B not crossing either square, or if A and B were the 
same vertex, then the alleged coding would not actually be a coding for 
any figure. Also, two concave polygons can*t always be Joined, e.g.. 



Z 5 



can't be joined at the marked points. Trying to 



see which angles could be fitted together led to the following result, 
which shows there is no problem for angles less than l80°. 

Theorem k . If a coding for a figure F iji which an^le (X is 

less than a straight angle, then U JLs a coding for a figure G in which 
a <€• aM a figure G* in which Ot >l80^ - eO, for a^ e >0. 
Proof. Let b,c be the sides of Of and A be their point of intersection. 
Draw a line through the endpoints of these sides and take this to be the 
X-axis and the perpendicular to this line from A to be the y-axis. 




Every vertex in this figure has a coordinate ix^,y^) in this coordinate 
system. For any positive number r, if we set up a similar coodinate 
system and give each vertex coordinates (x^,ry^), and connect them the 
same as in F, the result will be a figure F* with the same coding. The 
only thing to check is that points that lie in a straight line in the 
first coordinate system have their images in the second coordinate system 
lying on a straight line. I omit the verification of this point. If r 
is small enough, Ct will be very close to a straight angle. Specifically, 
take the shorter of the two lires, say b, and choose r so that 
rh<x^sin(€/2) . Then the resulting figure is an appropriate G* . By 
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taking r very large one can get an appropriate G. Q.E.D. 

As an immediate corollary, we get that a similar result holds for angles 

greater than l80°. 

The intuitive meaning of this theorem is that all angles in a figure 
less than (greater than) a straight angle are indistinguishable in a 
coding for the figure. Thus, for each vertex of a polygon the most one 
can determine is whether or not the polygon is concave or convex at that 
vertex. The case where all polygons are convex is simpler than the 
general case, so it seemed to me that it would be good to consider a 
figure in which as many vertices as possible were convex. If a vertex 
is inside a polygon not much can be done in this regard, for the vertex 
will then be on at least two polygons (assuming the coding is connected 
and has no legs or breaking points) and will be concave on one and convex 
on the other. Hence the following definition was framed with the idea 
of applying it to polygons which have no part of their perimeter enclosed 
by other polygons. I call such polygons outride polygons. 

Definition: A is concave in U if for every figure of 

which U Is the coding evei*:^ pojLygon P of which A is a 

vertex is concave at A. 
Thus, if A is a central point in one of the sides of P, it is still 
regarded as being concave, since the following figures cannot be joined 



no other concave angles^ but this is not so. If U is a coding for this 



polygon, the idea was to choose a figure in which that angle is convex, 
if possible. 

Other examples like the above one convinced me that an angle on an 
outside polygon is concave if and only if there is a straight line passing 
through it. If there is no such line, it seems one could bend the figure 
out without altering the coding. A proof of this is currently lacking, 
however. For vertices inside a polygon, I say that vertex is acute if 
there are two adjacent simple line segments coming from that vertex in 
some figure for which U is the coding which are more than iSO^ apart. 



at the marked 




It may seem that there would be 



figure 




concave angle. For angles on an outside 



58 



The idea is that it would be possible to attach a component with that 
vertex concave to such a point, but not otherwise. I think it could be 
shown in a way similar to the method of establishing the first result 
in this paragraph that A is acute if and only if there is no polygon P 
such that there is a line from A to each of ohe vertices of P. 

If we now call an angle acute also that is convex but on an outside 
polygon, the following condition is necessary and sufficient for the 
result of combining all the components to be a coding: if A is a breaking 
point in U, U^'*'U^ components of A, then U is a coding if and only 
if each of the is a coding and i) at least n-1 of the are codings in 
which A is on ^.he outside of U^, and ii) at least n-i of the are 
codings in which A is acute. Each is here regarded as having no legs, 
and hence each figure for which is a coding will be enclosed by some 
polygon, and A is on the outside of if there is a figure in which A 
is on the outside polygon* The above une'stablished results would give a 
way of determining these conditions from the coding and hence this would 
be a good way of decomposing a coding with breaking points into simpler 
codings. 

The problem now is to determine which of the components are codings. 
These have no legs and no breaking points, and thus they are enclosed by 
a polygon. In the simple example of a good set that isn*t a coding, it 
is impossible to draw a figure for it without having to have some point 
both inside and outside some polygon. Other examples convinced me this 
was a general phenomena, and so I thought it would be wise to see what 
restrictions the coding places on inside /out side relationships. Given 
this, there is a natural starting point for a coding that contains no 
breaking points or legs, since the polygon which encloses a figure for 
which it is a coding will have to be such that it is possible to have 
everything inside it. The simple example of a good set which isn*t a 
coding has no such polygon. 

This led me to define analogously to A* arid component of P 
analogously to component of A. 

Definition: = {simple segments with at least one endpoint on P) . 
Notice P itself is included in P^. 

& 
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Definition: U-P^ is the coding obtained from U by deleting all 
the elements of P^ except that single central vertices aren^t 
removed . 

Definition: U? is a component of P if U?=U. (a component of 
U-P^)U {Ti\ XV"':P>^,X in ^ or uP = {XA: A is not on P or any 
of the components of U-P^l . 

It is now possible to state the following restrictions oi outside/inside 
which can be determined directly from the coding. 

Definition: An assignment of inside/outside to a polygon P in 
U is consistent if and on^ly if for A,C not on P, B on P and Q 
any other polygon in U the following conditions are satisfied: 

1, rf B IS not a vertex of F and ABC cU, ihen A is inside 
F if and only if C is outside P- 

2. If B is a vertex of P, ABC cU, then P is acute at B 
implies either A or C is outside P and P is concave 
at B implies either A or C is inside P. 

5. If AB is an extension of a side cf P^ then P is acute 
at B if and only if A is ouiside P and B is concave if 
and only if A Is inside P. 

Points on trie same component of P are on the same 
side of P. 

5, If P is inside % then Q is outside P. 

6\ If P and Q don*t intersect^ then either P is outside 

Q or Q is outside P. 
7* If two components of P have three points in common, 

then they are on different sides of Pt 

8, If Aj^., A^, A^ and A^ occur in order on P, and A^ and 
A^ cU^ and A^ and A^^ c-U^, then Uj^ and are on 
opposite sides of P. 

9. P is concave at most n-5 places if P is an n-gon* 

These are necessary conditions to be able to draw a figure for U 
without violating inside/outside. I don*t believe they are sufficient, 
and in any case I canH. prove that they are, which is the hard part. 
I do not see any obvious way of proving that a set of conditions is 
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I 

sufficient, and this is the main problenic The thing to do would be to 
I show that it is possible to draw a figure if the conditions are met, 

but there is no clear way of going about this. It probably wouldnH 
I be too difficult to make a list of necessary and sufficient conditions 

once one had some idea how to do this* 

I A different approach to the representation problem would be to start 

f 

from one of the well-known axiom systems for Euclidean geometry and see 
I what conditions a good set of lines must fulfill in order to satisfy these 

' axioms* However, the axioms apply to particular figures, while the 

/ problem under consideration is to see if there is any possible figure of 

1 which a given coding is a coding >^ Thus, if one attempted to draw a 

figure that had a given coding, the axiom system could tell if and when 
j this particular attempt went wrong. The only way this would be of help 

in solving the representation problem would be if one had a way of 
j listing a finite number of 'possible figures' for each coding. Given 

such a list, it would be possible to determine if there was a figure of 
I which a given coding was a coding by simply runxdng through all the 

possibilities- It seems plausible that for any given coding there is 
I a method for listing a finite number of figures. Indeed, results like 

J Theorem k should be useful in the attempt to find such a method. 

J However, there is no natural way of constructing such a list, and it 

j seemed to me that the approach I tried was more likely to be successful. 

The basic idea behind my proposal for solving the problem of abstract 
I ideas, for the limited context I have considered, is quite simple. 

Essentially 5 it consists of identifying the abstract idea of triangle 

with the procedure given in Theorem 1 for recognizing triangles and the 

abstract idea of polygon with the procedure for recognizing polygons 
I given in Theorem The problem with this is that it is obviously 

^ possible to give procedures that are variants of the procedures of 

those theorems which have the same end result. It would be completely 
j arbitrary to single out any of the particular variants as the abstract 

idea* 

In the following discussion, 1 will restrict myself to the abstract 
idea of triangle, but the same things are true for the abstract idea of 
I polygon. There are two ways out of the above difficulty. The first way 
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is to identify the abstract idea with the class of all procedures that : 

recognize triangles. This has the result that there is only one abstract 

idea of triangle, which is the view of Locke, Berkeley and Hume and many 

other philosophers. Indeed, it is the belief that there is only one ; 

procedure which Justifies the terminology *the abstract idea J 

It seems to me very unsatisfactory to identify the abstract idea with \ 
a class of any type. Moreover, upon reflection it seems that it is 

mistaken to believe that there is only one abstract idea of triangle, and | 

hence I think the terminology 'the abstract idea' is misleading. As far 

as I can see, there is no compelling reason for holding that different ( 

people who can recognize the same property have the same abstract idea. 

The property of being a triangle is an objective property of figures, . 

while the abstract idea is a subjective mental disposition of a device I 

that can recognize triangles. Thus, the natural thing to do is to speak 

of a particular device's abstract idea, and hence to reiativize the 1 

notion of abstract idea to particular devices. The correct terminology 

would be 'the abstract idea of device A,' not simply ^the abstract idea.' I 

To justify introducing the phrase 'the abstract idea,* would require 

conclusive evidence that all devices have the same abstract idea. This T 
is seemingly going to be impossible to obtain, as there certainly are 

different ways to recognize the same property and nothing to indicate t 
that there is a particular one that all devices happen to use.. 

This use of the tem 'abstract idea' may seem a little strange. ^ 
However, since the traditional use presupposes that it is part of the I 
meaning of 'abstract idea' that for every property there is only one 

abstract idea and yet wants to maintain thai an abstract idea is mental, | 
it seems that this use is mistaken* It seems to me that the crucial 

notion that must be saved is that an abstract idea is mental (a property '| 
of a device), and hence I have chosen to use 'abstract idea' relative 

to different devices. To be technically correct, one should really speak *| 
of 'the abstract idea of device A at time t,' but this is a detail I 1 
shall ignore. - 
i^^^-'-ated precisely, my proposal is to identify the abstract idea of i 
a device A wioh the procedure that A uses to recognize the appropriate 
property* For recognizing triangles, such a procedure would not have to j 




take into consideration the specific properties of any particular 
triangle. Certainly the procedure of Theorem 1 doesnH, and for any 
other procedure that is similar to it in this regard the problem of the 
abstract idea having specific properties, which bothered Berkeley and Hume, 

does not even arise. 

I will nov show that a device B that uses tho procedure of Theorems 
1 and 5 acts in an intuitively appealing way. It seems to me that 
people act in roughly the same way, but t.is is pure conjecture. First 
of all, Hume hac great difficulty with the abstract idea of triangle, 
but the idea of jHDlygon is even more abstract, and there is no way to 
account for such an idea in his theory. The procedure of Theorem 3 
solves this problem, since it is B^s abstract idea of polygon. Moreover, 
this procedure has the very desirable property that it recognizes that 
a figure is a po].ygon before it recognizes how many sidec it has* The 
desirability of this isn*t obvious when one considers triangles, but it 
is if one considers Hume's example of a chiliagon, which is a thousand- 
sided polygon, Hume points out bhat people could recognize a chiliagon, 
but not by comparing it directly to some picture in their heads. Rather, 
they would first recognize that the figure was a polygon, and then count 
the sides to see that it had a thousand. . If beside the procedure of 
Theorem 5, B had the ability to count, then it would proceed in the same 
way. This is another reason I believe that the procedure of Theorem 5 
is close to the way people actually operate* 

Another gap in Hume's theory is that he has no way of accounting 
for the fact that 'triangle' is a special case of the more general 
predicate 'polygon-' This is also solved in the case of B. Ordinarily, 
B uses the procedure of Theorem 1 to recognize triangles, since it is 
much quicker than using the procedure of Theorem 5 to recognize that the 
figure is a polygon and then counting to see if it has three sides. 
However, both procedures will in fact recognize triangles. The latter 
procedure shows that Hriangle' is a special case of 'polygon,' and since 
the former is equivalent in that it recognizes the same prc^.cirty, it 
too is a special case. 
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Thus, it seems to me that these procedures completely solve the 
problems that worried Berkeley and Hume as far as the limited context 
of the codings is concerned. Insofar as the codings resemble the codings 
people actually use, it solves the actual problem that Berkeley and Himie 
addressed themselves to. 
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CHAPTER 3 
LEABN'ma TriEOPY 



The purpose of cur work in learning theory was to modify and extend 
the results Suppes obrained in St^imulus-Response Theor y of Fiixite 
Automata (SRTFA)."^ i' wiii give the main resuit.3 of that paper and 
sketch the set up used in its derivation^ emphasizing the points of 
particular relevance to the present work, T will indicate the 
alterations we tnought desirable, give our reasons for thinking this, 
and then give an accouni of the work ve did^ 

As the Title suggests; Suppes' main concern is to shew the 
conrxectioa between stimulus-response (S-R) t.neories and finite state 
automata (fsaj. In S-R tneory^ an organif-m learns in a series of 
trials. Following Suppes* formulation, what happens at each trial can 
be described intuitively in the following way^ the organism is in state 
of conditioning C at ihe beginnirjg of the trial, is presented with 
stimulus T, samples stimuli s, makes response r, receives reinforcement 
e, and goes to the state of conditioning C* . As in SRTFA^ 'J ^se S r,o 
denote tne set of possible stimuli, R to denote the set of responses and 
E to denote the set of reinforcements. An individual element of S is 
denoted by c , T and s are in general subsets of S, where s ^ T* I will 
not go into the details of the general S-R model, for later in this 
chapter I give a modified version of this S-R model, and all the details 
are spelled out there. 

■'■Patrick Suppes^ '^Stimulus-Response Theory of Finite Automata,'^ 
Journal of Mathematical Psychology , 6, 3, October, 1969* : 
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The intuitively correct way to make the connection be-cween an S-B 
model with a finite number of stimuli and responses and an fsa is to 
think of the set of stimuli S as being zhe input alphabet A of the fsa 
and the set of responses R as being its set of states Q. Technically, 
this is not quite correct, for the axioms of S-R theory require a 
special stimulus to put the fsa in its initial szate r^, and is 
thus not in A. The rest of the stimuli are. On each trial, the S-R 
model gives one response, which is equivalent to the corresponding fsa 
making one transition. Since the transitiorx an fsa makes is a function 
both of the state it*s in and the letter of A it is looking at, the 
presented set T on a trial canH consist simply of elements of S, for 
elements of S correspond to letters of A. Rather, in order to account 
for the fact that a transition depends on the state of the fsa^ T must 
consist of pairs (r,o), since elements of R correspond to elements of 
Q. Things are still very messy if T has more than one element^ so T 
is restricted to having one element. If T has one element, the sampling 
axioms require that s = T. so one can say simply that a pair (r,o) is the 
stimulus on trial n without worrying whether it is s or T. The intuitive 
meaning of the pair (r,a) is r.hat r is the organism* s previous response 
and 0 is the present stimulus. 

There are two other important features of the set up of SRTFA. 
The set of reinforcements must contain a reinforcement e^ for each 
element r of R. 1 will discuss this feature later. Secondly, the S-R 
model in SRTFA is an all-or-none conditioning modul, i.e., zhere are only 
two possible states of conditioning for each pair (r,o); either (r,o) 
is unconditioned, in which case there is a positive probability of giving 
each response, or it is conditioned to some response r* , in which case 
r* is given with probability 1. If (r,o) occurs and is unconditioned 
and e^f occurs, there is a positive probability c of (r,o) becoming 
conditioned to r* ^ and if (r,o) is already conditioned, it remains 
conditioned. Notice that in this set up no states are ever conditioned 
incorrectly, since e^^ always occurs after (r,o) if r* is the correct 
response. Once all the pairs (except those containing s^) are conditioned, 
the organism will behave exactly like an fsa. Indeed, one can use the 
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conditioning table to construct the transition matrix of the fsa. On 
this intuitive basis, it is possible to formally define what it means 
for an S-R model to become an fsa. Suppes does this, and using the 
ordinary notion of isomorphism between automata, he proves that for any 
connected fsa there is an S-R model with all its states initially 
unconditioned that asymptotically becomes isomorphic to it. The key to 
the proof is to show that for some ntunber n there is a positive probability 
that each unconditioned state will occur in each sequence of n trials. 
Since, on each occurrence, there is a positive probability that it will 
be conditioned, it will eventually get conditioned. The details are 
similar to those that are given later. 

It is particularly interesting in view of the present work that 
the result remains essentially unchanged if a linear learning model is 
used, i.e., if the probability of responding r* when presented with 
(r,o) is p^, and e^, occurs, then the probability of responding r* the 
nexo time (r,o) occurs is p^,(l-9) + e, 0<9<1, and for r*^ rS the 
probability is (l-9)p ^. Thus, the probability of giving the correct 
response increases each time (r,oj occurs. In fact, it approaches 1 
as the number of times (r,o) occurs approaches infinity, and hence a 
model of this type, though messier, will also at asymptote become the 
correct fsa. 

I*he main reason we felt that modification was desirable is that 
the learning that takes place in this set up is of a rather simple 
nature. The organism learns each appropriate response independently, 
and hence doesn't huve to learn the task as a whole. Such a model is 
not adequate to accoiint for most human learning, since in the typical 
experimental case with a human subject, the subject discovers a method 
of doing the task on his own (which may r^trjain unknown to the experimenter) 
and it is *iot necessary that the experimenter should provide a particular 
method of doing this, which in effect is what he is doing if he 
reinforces the responses of a particular fsa. In particular, it is not 
adequate for the learning of geometrical predicates in the way we had 
in mind. As mentioned in Chapter 2, we were thinking of a device 
learning a predicate on a series of trials where on each trial a figure 
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is presented^ a yes or no answer elicited, and reinforcement given 
according to whether or not the answer is correct, which is determined 
by v/hether or not the drawing has the predicate in question. Thus, there 
are only two different reinforcements, not as many different reinforcements 
as responses. Clearly, it will take an fsp with many states to recognise 
interesting geometric predicates, and hence Just as many responses must 
be possible in the S-R model > It is easy to regard a model with many 
responses as giving only yes and no answers, since all '^hat is required 
is to pairtition the set of responses into two sets, > f or yes), and 
(for no} J just as in automata theory, the set of states is partitioned 
into the set of final states and its complement^ The problem is that only 
one response is made> while ^he fsa obviously requires several transitions 
to get an answer. Thus^ we have to regard the organism as making 
internsLl responses. There is in general nothing wrong with this, since 
some such activity is obviously going on, but it is a real problem in the 
framework of SB.lFAy which requires that every response be reinforced, 
since it is impossible to reinforce an internal response. Finally^ there 
is one other reason it is undesirable to have to single out a particular 
fsa beforehand* What one is really interested in is the S-R model 
eventually learning to recognize the predicate in question, and one is 
indifferent as to hov this is done as long as the method is reasonably 
efficient • Since there are many fsa^s that can do any given task, more 
than one fsa is in general acceptable. 

To get an adequate model for the type of learning we wanted requires 
changes in the set up of SRTFA, Firsts the notion of trial has to be 
altered to allow more than one stimulus to be sampled and more than one 
response given on each trial* Intuitively, a trial now consists of the 
fsa processing a whole tape, rather than making a single response. In 
the case of a rat running a maze, for example, a trial now consists of 
the rat going through the entire maze, instead of making a choice at ^ 
certain branching point. The way to accomplish this formally is to 
introduce the notion of subtrial, and to consider a trial to consist of 
a series of subt rials. A subtrial corresponds to the trial of SRTPA as 
far as sampling the stimuli and responding are concerned* This will be 
evident from the axioms, for the sampling and response axioms are exactly 
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the same as those of SRTFA except that s^, T^, and (the sampled 

stimuli, presented set, and response, respectively, on trial n) are 

replaced by the corresponding notions s^ , T , and r (the sampled 
*^ n,ni n,ni n,iu 

stimuli, presented ^et, and response on trial n, subtrial m). In this 

set-up, it is convenient to think of the organism as having given initial 

response r at the beginning of each trial, and thus ignore s^. 

The organism's final response on each trial is regarded as its 

answer. The intuitive idea is that the final response corresponds to 

the answer that the subject gives, while the_/previous responses are 

internal. There are only two reinforcements, correct anr? incorrect, and 

whether the last response is in or R^ is the only factor which determines 

which reinforcement is given. The previous responses are necessary for 

the organism to know what the final response should be, but do not effect 

the reinforcement. Formally, let m^ be the last subtrial on trial n, and 

e^ and e^, respectively, be the positive and negative reinforcements. 

Then r . the response on trial n, subtrial m , is the final response 
n,mn - n 

on trial n. Reinforcement depends only on r^ ^ ; e occurs if and only 

n,mn -i- 

if response r is correct, i.e., if a yes answer is correct, then 

r eR , and if a no answer is correct, then r eR . 
n,mjj y' n,mn n 

This set' up has the property that all f sa* s that can do the given 

task are eqxially acceptable. It does not require that a particular fsa 

be singled out as does SRTFA. What we do require is that the alphabet, 

set of states, initial state and set of final states be given, and that 

there is in fact an automaton with these four components that can do the 

task that is to be learned. This is accomplished formally by introducing 

the concept of a signature. The only thing we donH require is a 

particular transition table. In SRTi*'A, the things we require are needed 

to choose an appropriate S-R model, while the transition table is needed 

to choose the appropriate reinforcement schedule. We have completely 

changed the method of reinforcement so that we donH need the transition 

table, but we need the other things. We still think of learning as being, 

in a sense, the constructic? of a correct transition table, but this 

construction must be accomplished with less information. Moreover, we 

donH have to worry whether or not the asymptotic fsa is connected. 
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Indeed, we don*t even require that ail states be conditioned at asymptote. 
If at asymptote the S-R model is an fsa with inaccessible or unconditioned 
states but that can do the task^ we are perfectly happy. 

There is a basic problem with this set up; namely, how many states 
should be available at the beginning of the learning sequence ? There is 
no problem with the alphabet, since it consists of the stimuli, and once 
the number of states is determined, it is fairly easy to choose a 
particular set of states, initial state and set of final states. However, 
there are no general results in automata theory that are helpful in 
deciding how many states are needed to do a particular task, and we made 
no progress in this direction. Even if such results were available, it 
is not clear how to use them, since one wants the organism to decide on 
the set of states ixself (particularly since most of them are internal), 
but it is not plausible that the organism would have these results 
available to help it» Thus^ the best approach might be to have the 
automata start out 'vith a small number of states and add new ones if 
these don*t prove stiff icient. T?he course we took was to sidestep this 
problem and simply assume that enough states are present* It is clear that 
having unnecessary stares available will greatly reduce the rate of 
learning, but tney don*t effect the asymptotic results we were concerned 
with* 

Our first step in approaching this problem was to concentrate our 
attention on the simplest possible non-trivial automata, since we thought 
(correctly, as it turns out) that all the conceptual problems would show 
up even in this case. These automta have two-letter alphabets (0^ and ^2^^ 
two states (r^ and t^)^ and one final (acceptance) stave (rj^)* We further 
required that all input tapes have length two* There are thus four 
different input tapes, and 16 ways to partition these into acceptable and 
unacceptable inputs. There are some interesting features in this set up* 
If one talces r^ as the starting state, as we did originally, and takes 
{(Jj^cf^^o^ag} to be the set of acceptable tapes, then there are two automata 
that will accept it, i,e-, whose final response will be r^ when presented 
with 0 0 or ^J^^f and r^, otherwise* These are 
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Notice that each connection is different, but that the final result 
is the same. Secondly, it is not satisfactory zo simply take r^ as the 
starting state, since there are acceptance sets, e.g., lo^o^fagO-^fO^o^] , 
which are accepted by no automaton (of the type we are dealing with) with 
initial state r^, but are accepted by one -Jlth initial state r^^ It would 
be possible to add another state, but this is a complication it is best 
to avoid. We decided that the best way to meet this problem would be 
to first ^ry to find an automaton with initial state r^ that would work, 
and if this fails, look for one with initial state r^^ This method 
reqiiires the organism to only be trying to construct one transition table 
at a time, which seems desirable. Finally, there are two sets, {02.^19^1^2'^ 
and [020j^f0^0^), which are not accepted by any two- state automaton. 
Intuitively, these sets are very easy to recognize, since the second element 
on each tape is irrelevant. The obvious thing to do to solve this 
difficulty vould be to try to get a method of recognizing irrelevant 
information, and have the organism apply this to the stimuli first before 
trying to construct a transition- table. We wanted to concentrate on 
automaton learning, however, so we did not use these two sets as 
acceptable sets. 

It is easy to see the difference between the learning procedure of 
SHEFA and the one we want in terms of this simple case. In both cases, 
the object is to construct a transition table, but in SRTFA particular 
transitions are rtinforced, while in our case the organism is only told 
which tapes are acceptable. This does not necessarily give information 
about any particular transitions even in the two- state case, as my first 
example shows, and things, of course, get worse as the number of states 
increases. The learning procedure will have to be such that either 
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of the two transition tables in ray example could result from reinforcing 
r^ responses to o^o^, and ^2^2' clear what natural learning 

procedure could have this result* 

One method of doing this would be to simply list all the possible 
transition tables, try each (or some) of them on each tape, and discard 
the ones that donH work. If more than one were left after this process 
were completed, one could be chosen arbitrarily. This method has two 
very nice features: it learns quickly if there is a transition table 
that will work, and it has a method of dete mining when there isn*t one 
that will work, which is very good for adding new states or trying a 
different starting state. The problem is that listing all the possible 
transition tables is a very sophisticated procedure and contrary to the 
intuitive notions of learning* In particular, there is no direct way 
to formulate such a procedure in S-R theory. We came up with variations 
of this procedure that donH seem as counter- intuitive as a list of all 
the possibilities, but finally decided that any type of eniimeration 
procedure was undesirable- Another method, which is much closer to our 
final approach, is the following, vhich we called the brute force method. 
Each state is either conditioned or unconditioned, as in SRTFA. If an 
unconditioned state is entered in processing an input tape, there is 
a positive probability of responding r^ and also of responding v^* If 
the response to the tape is correct, the state becomes conditioned to the 
response it actually gave. There is a problem with tapes a^a^ and Og^a^ 
since, if these are the input tapes, the same state may be entered twice. 
If the organism acts as a probabilistic fsa, it could respond r^ one 
time, and the other, and still get the correct answer. In this case, 
by the above conditioning rule, the same state would have to be 
conditioned to different responses. Since this can*t happen, either the 
conditioning rule must be changed, or the organism cannot be regarded 
as acting like a probabilistic fsa. For brute force, we didnH want to 
change the conditioning rxile, so we decided that on a given trial 
responses when a given state is reentered will be determined by the 
response given the first time the state was entered. Even with this 
conditioning rule, it is obvious that soire states could be conditioned 
incorrectly. Thus, it was necessary to introduce a deconditioning rule. 
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For brute force ^ we decided that whenever a wrong answer is given, all 
states will be deconditioned. Thus, the organism will have to start 
over. This method will eventually learn, since once all the states are 
correctly conditioned, they will never become unconditioned. Brute force 
lacks the good features of enumeration, but it does have a simple learning 
procedure. 

The next thing we did was to reformulate two features of brute- 
force learning to make it more like an S-R model. We reversed our previous 
decision and decided that it would be better to change the conditioning 
rule and regard the organism as acting like a probabilistic fsa. The way 
to do this is to simply introduce an order in which the way states are 
conditioned. We took the natural course of specifying that, after 
reinforcement, unconditioned states are conditioned in the order they 
were entered on the trial, the first such state being conditioned first. 
The second thing was to say that only conditioned states that were used 
on the given trial are subject to deconditioning. These changes make it 
possible to write axioms very similar to those of SRTF'A which lead to 
the desired asymptotic result- 

We considered two different kinds of conditioning in this framework: 
all-or-none and linear. In the case of linear conditioning where there 
are only two responses, the conditioning and deconditioning procedures 
are similar. Indeed, deconditioning looks exactly like conditioning for 
us, since we took the deconditioning parameter to be equal to the 
learning parameter 9* The linear method works In the following way: 
Suppose at the beginning of a trial the probability of responding is 
Pj^ when in state i^^j^fO^), 1, J and k = 1 or 2. If is entered 

and rj^ is given, then if the final response is correct, the probability 
of responding r^^ the next time (^^^Sj) is entered is Pj^(>9) while 
if the response is incorrect, this probability is Pj^(l-e). Once the 
original probabilities are specified, this is sufficient to determine 
all the probabilities, since there are only two possible responses, 
the probabilities of which must sum to one. It is possible for a 
certain response to be both incremented and decremented on the same 
trial, e.g., if a^o^ is the presented tape and it should be accepted, 
and the initial state is r^, then it is possible for the responses to be 
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and then r^. This answer is incorrect, so the transition from (r^,a^) 
to r^ is decremented because of the first response and incremented 
because of the second one. In this set up, incorrect respons s get 
reinforced, and it is not clear whether or not the probabilities of all 
the transitions will converge to 0 or 1 as the number of trials increases* 
It seems as though they might not in the case where {o^a^,o^o^) is the 
acceptable set, since there are two possibilities that have all connections 
different, so we chose to concentrate on a case where this doesnH happen. 
The case we chose is where the acceptable set is {a^a^^}* In this case, 
there is only one possibility (which is given by the following transition 
table), for which we came up with special names for the transition 
, probabilities (given in the second table): 
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The states are numbered from 1-4, and a^ is the probability of giving 
a correct response when in state i« With this notation, it is easy to 
construct the following table, which tells both which combinations of 
responses result in correct answers, and what the probabilities of such 
combinations are: 
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Prom this table, it is easy to see that a^^ will converge to 1, since it 

is always correctly reinforced (a^ occurs only in correct column, 

only ?n incorrect column)^ Assuming that each tape occurs with probability 

the occurrence of b^a^ in the incorrect column means a^ canH converge 
to 1 unless a^ does* This can easily be checked by computing the 
expectations. Similarly a^ can't converge unless a^ does^ and a^ can*t 
unless does. Thus, they all converge together, or none do. Moreover, 
they tend to cluster together, high a^' s pulling low ones up and vice 
versa. What I tried to do was show that if all three got within some 
distance of 1, they would converge to 1. It seemed that some such 
procedure would be necessary to take care of the cases where there are 
more than one possible transition table. T couldnM come up with anything, 
and, in fact, 1 soon became convinced they didnU converge. I knew no 
good way to prove this, and since it became apparent that the all-or-none 
model would converge, we simply dropped the linear conditioning model. 
Moreover, it seems that even a model with stronger tendencies to converge, 
such as Luce's beta model, wonH help, since there is just about as 
strong a tendency for incorrect responses to be reinforced as correct 
responses. In retrospect^ it seems that the reason tne brute-force method 
works is that sooner or later the correct responses geo conditioned and 
once this happens only correct answers are given; thus, no negative 
reinforcements occur, and hence no states are deconditioned. This canH 
occur in models that have to have their probabilities converge to 1. 

Tn the ail-or-none conditioning model, each state is in one of two 
situations: conditioned^ in which case it is conditioned to some 
response r, ^ and unconditioned. If it is conditioned, then it responds 
with the response it is conditioned to with probability 1^ and if it is 
unconditioned^ there is a constant positive probability of respor^ding 
any of the possible responses. Thus, the only dii^ference between the 
'revised brute*- force metnod and the all-or-none conditioning model is that 
the latter has conditioning and deconditioning parameters c and d, 
respectively^ 0<c, d<i- Instead of ail states that are entered on a 
given trial being conditioned with probability 1 when a correct answer is 
given, as in brute force, they are conditioned with probability c. 
Similarly, when an incorrect answer is given, the entered states iare 
deconditioned with probability d. 
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The problem now is to fonnaiizp the correct S-R model and prove the 
desired asymptotic result* In the present S-R model, the stimuli and 
responses are treated the same as in SKITA, the set S of stimuli and set 
R of responses both being primitive concepts. In ** he present set-up, 
however, we need an added primitive concept, that of H ^ which is a 
specified subset of R. Intuitively, if the final response on a trial is 
in R , then the model is regarded as having responded yes^ and if the 
response is in R^ ^R-R^, then the model is regarded as having responded 
no. The set E of reinforcements is also primitive, but it contains only 
two elements, e, and e^, rather rhan having an element corresponding to 
each response as in SbTIFA. The fift,h primitive concept is a measure u 
on the set of stimuli, and is exact ly the same as in SRTFA^ "'he concept 
of subtrial requires the introduciion of a new primitive concept M, which 
is a sequence of positive integers m^- Each m^ indicates the number of 
subtrials on trial n^ This notion is necessary in defining the next 
primitive concep., that of the sample space X* Each element of X 
represents a poscible experiment, i.e ? an infinite sequence of trials, 
where each trial n has m sub'.rlals. Each 1riai is an (m -^^l-huple, 
consisting of three things: l) the conditioning funcrion at the beginning 
"of the trial which is a partial function from S into R, where C(a) = r 
means a is conditioned to r and C(a) undefined means a is unconditioned; 
2} m^ triples of the form (T^s^r) each of w^hlch represents the presented 
set, sampled stimuli and response on a subtrial^ and 5) ''he reinforcement 
which occurred. The sevenf:h and final primitive concept is the 
probability measure P on the appropriate Borel field of cylinder sets of 
X, which is easily defined since there are only finite number of stimuli 
and responses* All probabilities must be defined in i.erms of P. 

Some notation is needed to take us back and for^h between elements 

or subsets of the sets stimuli, responses, and reir^forcements to 

events of the sample space X. I will follow the notation of SRTFA as 

ctosely as possible- T is the event of set T being presented on trial 

i^,m 

n, subtrial m, i*e., it is the set of all elements of X that have T as 

the presented set on trial n, subtrial m. When this notation is used, I 

always suppose that l<m<m - s and r are defined analogously. 

— — n n,m n,m 
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There is no need to mention subtrial when speaking of reinforcement 

and conditioning. Thus^ e., is the subset of elements of X in which 

n 

e^^ occurs on trial n. is the even'': of conditioning function C 

occurring on trial n* I will write a € C to mean aGdomain(C), and 

a € to mean C(a) = r* 

For each possible experiment X, and each element z of a trial of 

X (either a conditioning function, a triple of the form (T,s,r) or a 

reinforcement), Y(z) is the pattern of events preceding (and including z) , 

i.e., Y(z) is the set of all elements of X that are the same as x up to 

and including z. I will write Y{T , s , r ) simply as Y(n,m)- 
^ n,m n,ui ii,in 

Jinally, conditioning takes place all at once in this model, but it 

is necessary to t' lnk of states as being (possibly) conditioned in the 

order they occur on a trial- This is most convenient to state if we 

introduce the notation for the conditioning function on trial n after 

response m has (possibly) been conditioned. I use superscripts, since 

is not explicitly a part of the sample space X, unlike T^ ^, s^ ^, 

r and C . AIbo, OJ^ = C . ^ . 
n,m n n n-t-1 

In the following axioms, it is assxamed that all evc^nts on which 
probabilities are conditioned have positive probability. For example, 
the tacit hypothesis of S2 is that H'^^^J and PC^i^j^f ) >0- ^^^^^ ^re 
three kinds of axioms, sampling axioms j conditioning axioms; and 
response axioms- A verbal formulation of each axiom is given together 
with its formal statement. 

Definition: A structure i/ = (S,R,R ,E,|i,M,X,P) is an S-R 
model if and only if the following axioms are satisfied: 
Sampling Axioms. ' ' 

SI. P(m(s^ ) >0) = 1. 
n, III 

(On every subtrial a set of stimuli of positive measure 
is sampled with probability 1.) 

(If the same r -sentation set occurs on two different subtrials, 
then the j^i^babliity of a given sample is indepenrient of the 
subtrial nifflibei * ) 
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S3. If s U s' S T and n(s) = n(s»), then Hs^J\J = ^^^^^m" Vm^' 
(Samples cf equal measure that are subsets of the presentation 
set have an eqxoal probability of being sampled on a given 
subtrial.) 

Sh. P(s IT , Y(n,m)) = P(s^ JT J. 
n,m^ n^m' ^ ' n^m* n,m 

(The probability of a particular sample on trial n, subtrial m, 
given the presentation set of stimuli^ is independent of any 
preceding pattern y(n^m) of events,) 
Conditioning Axioms o 

CI. If r,r< c;R,r^r« and C^ i 0, then P(cJ- 0. 

(On every trial with probability 1 each stimulus element is 
conditioned to at most one response.) 

C2. P('/c(cSS+l;^|acs^^^,a^C%,r^^^^^ = r,e^^^^,Y(n,m)) = c. 

(if e^ occurs on trial n^ the probability is c of any previously 
unconditioned stimulus that is sampled on a subtrial becoming 
conditioned to the response given on that subtr'.al and this 
probability is independent of the particular subtrial and any 
preceding pattern of events Y(n,m).) 

C5. PUcCc^^'M^lae s^^^,a^Cr^^^^l^r,e^^^,Y(n,m)) = 0. 

(if e^ occurs on trial n, the probability is 0 of any previously 
unconditioned stimulus that is sampled on a subtrial becoming 
conditioned to a response iifferent from the one given on that 
subtrial and this probability is independent of the particular 
subtrial and any preceding pattern of events Y(n,m).) 

Clf. P(oc{C^-l)'^! a ^ s^^^,a<s(Cg)^e^^^,Y(n,m)) = 1. 

(if 8^ occurs on trial n, the conditioning of previously 

con(? '.tioned sampled states remains unchanged.) 

C5. P(acCg^l|acs^ ni>^5^'^> „.Y(n,m)) = 0. 

n^ m c, 

(If occurs on trial n;,the probability is 0 of a previously 
unconditioned st^uli that is sampled on a subtrial becoming 
conditioned* ) 
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C6. via It C^^''\ocs^^^,ae(^,e^^^,Y{n,r.)) = d. 



(If e occiirs on trial the probability is d of any previously 
conditioned stlssulus that is sampled on a subtrial becoming 
imcondltioned and this probability is independent of the 
particular subtrial and any preceding pattern of events 
Y(n,m).) 

C7. Hoc{cf'^f\a^s^^^,a^{(^f) = 1. 

(With probability 1, the conditioning of unsampled stimuli 
does not change.) , 
Response Axioms. ^ 

El. If U/nsi<Othe„P( IO„,s ,Y(„,m)=!4^^. . 
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(if at least one sampled stiiiulus is conditioned to some 
response, then the probability of any response is the ratio 
of the measure of sampled stimuli conditioned to this response 
to the measure of all the sampled conditioned stimuli, and this 
probability is independent of any preceding pattern Y(n,m) of 
events.) 

R2. If then there is a number such that 

P(r ,Y(n,m)) = p . 

^ n,m' n n,m r 

(If no sampled stimulus is conditioned to any response, then 
the probability of any response r is a constant guessing - 
probability that is independent of n and ar^ preceding 
pattern Y(n,m) of events.) 
As indicated earlier, the samplir^g and response axioms are exactly 
the same as in SRTPA, except that the concept of trial^has been replaced 
by that ol subtrial. Only the conditioning axioms have had to be changed 
to ensure the desired learning* - 

I will use only a very specieuL kind of S-R model, one that has a 
natural relationship to fsa^s. Before specifying the restrictions that 
are necessary, something must b^ said about fsa's. In the following, 
i, k and l are used as subscripts for states of an f sa and responses, 
hence l<i, k, jt<'h, and j is used as a subscript for letters of the 
alphabet of an fsa and stimuli, hence l<J<g» 
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Definition: The quadruple v = (g^h^p^H) is a signature if 
g, h and p are positive integers, 1 < p < and H c {l,2. . .h) . 
Definition: If v = (g,h,p,H) is a signature, then ^(v) = 
{D: D is a probabilistic fsa with alphabet A containing g 
elements (denoted by a,., a ), set of states Q containing h 
elements (denoted by q^***^^), initial state and set of 
final states where e e 'H, such that for all i 

and when D is in state q^ and scanning a^, it makes the 
transition to some qj^ with probability 1 (in which case 
(a ta.) is said to be conditioned), or for all k, it makes 
the transition to q, with positive probability (in which case 
(q.,a,) is said to be unconditiored)] . 

Definition: ^^^(v) = {deterministic fsa' s with alphabet A 
containing g elements (denoted by a . )^ set of states Q 
containing h elements (denoted by q^. . .q^),'' initial state q^ 
and set of final states F, where q^ e F<=^i € H)*"'" 
If D €-y<9^(v), I will say all states in D are conditioned. 
Definition: If D e ><5^(v), then is said to be 

indifferent in D if Vw e A*, D accepts w Independently of 

the state of conditioning of (q.^a.). I 
All states that are inaccessible in D are indifferent in D. The converse is 
false. For example, if F = 0 or F = Q, all states are indifferent, but the ^ 
initial state, in particular, is not inaccessible. A non-trivial example 
would be the case of an fsa that ignores the first letter in each of the 
words it is presented with and never reenters the initial state. In such 
a case, the initial state is indifferent, but it is necessary in the sense 
that if the fsa has the minimum number of, states possible (which can occur), 
it is impossible to delete the initial state from the set of states and 
still get an fsa that accepts the same words. 
Let A''^ be the set of all words in the alphabet A, and Gc:: A^ the set of 
words we want to be accepted. J 
efinition: If A* £ A*^, then^Q(v,A') = {D €/<9^(v): D accepts all 

and only elements of A' j . i| 
The only case of interest is /&q{y,G) * 

Definition: If D e^^{y), then = (w € A*: D accepts w) , ' 
Let A** and A^ be the complements of A* and A^^ (relative to A^), ^ ^) 

■""This is my original definition* I altered tne definition in the final 
draft to make the proof a little slicker. Unfortunately this change made the 
^ proof invalid. Fortunately Nancy Mole r called this (and sundry minor errors) .| 

fcjvv to my attention before printing. Ij 
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Definition: If D eJP^Cv) and A<£ k^-, then Aj^^^, = {A^n A' )U {A^fi A' ) . 
Again, the only case of interest is ^, which I shall simply write at 

V 

Definition: If Be^^iy), GSLA* and t is a positive integer^ then 
= fv^Ap • length (w)<t3. 

The purpose in defining will be evident shortly. 

The problem of choosing the number of states, which was discussed 
earlier, is the same as choosing an appropriate h. g is determined by 
the task to be ^performed^ but h, and H must be chosen by the 
organism* .^e crucial step is choosing h, since once h is determined, 
p and H can be obtained fairly easily. As mentioned earlier, we found 
no way to determine h given the task^ and simply assumed that enough 
states were present, which is expressed in the final theorem by the 
requirement thatjO^Cv^G) is non-empty. 

If S = {a . ..a ) is a set of stimuli, S* is the set of words in 
S, and Z<S*. S corresponds to A, S* to A* and Z to G. I now want to 
define a class of S-B models S(v,Z) and show how this class corresponds 
toSiv) andJ^QCv^G). Letj/ be an element of S(y,Z). S and R are still 
taken to be primitive, but in the definition ofj/, the role of S is taken 
by RXS. The reason for this was indicated in the discussion at tne 
beginning of this chapter. - ^ 

Definition: If |/ = (RxS,R,Ry,E,H,M,XP) is an S-R model, 

and on each subtrial T = {(r.,cj.)3 for some i and J, 

n,m 1 J 

then a is the element of S occurring in T , and 
n,m n,m 

n,0' n,l n,m^ 

Thus, a^cS*, and corresponds to word in A*. 

Let f be the natural map from AUQ onto SUR, i.e., = s and 

f(gj= f "laps A* onto and pairs (q.>a ) onto pairs (r,,sj. 
The relationship between S(v,Z) and*)(v) is that for each;B in S(v,Z}, 
f maps the set ^ of possible conditioning functions of jf onto^lv). 
For Ce6^, conditioned states in C correspond to conditioned states 
in the corresponding element of D of i^(v), and unconditioned states cf 
C correspond to unconditioned states of D. Indeed, this fact is the 
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reason for introducing the conditioning terminology into the definition of 

^(v). If f(G) « Z, then#6)Q(v,G) corresponds to the set of possible correct 

values of the asymptotic conditioniife fxmction of all ^€ S(v,Z), i.e., if C € 

is a i)ossible asymptotic conditioning fxmction of ;af(j/ always responds 

correctly when the conditioning function is C) then 3D €<i^q(v,6) s.t. all 

states in C are either conditioned the same as in D or are indifferent in D. 

t t 

Finally, corresponding to A^, is A^. 

Definition; If i - (RxS,R,Ry,E,^,M,X,P) Is an S-^E model, 
Z£S*, and C a conditioning function of J s.t.VxeS*, 
if aJt = x, then P(r e R ) = 0 or 1, then = 

(x€S*: a* = X =* r _ cR ) and = {xcS*: x€(sl/)Z) U 
n n,m y C u 

t ^ — 
(znSg)) and A^ = {xcA^: length (x)<t). 

Whenever the notation A^, or A^ is used, it will be assvaned that C satisfies 

the condition that Vx eS*, if C = C , then P(r €R ^ •= 0 or 1. 

' n' n,m^ y' 

Definition: If v = (g,h,p,H) is a signature and ZSS*, 

then S(v,Z) is the set of all S-R LiodelsV = (RxS,R,R ,E,^,M,X,P) 

y 

satisfying the following conditions: 

i) S has £ elements, denoted by a, ...a • 

ii) R has h elements, denoted by ^^''-r^* 

iii) €R ^i cH. 
' i y^ 

iv) Vn T 1 ^ 9^*) J* 
' n,i p J 

v) Vn, Vm s.t. l<m<ti .T ^ = (r some J. 

^ Xi n,iii n,iu"* J- J 

vi) jx(sO is the cardinality of for S^cRxS. 

vii) p^, the probability of responding r^ when no sampled 

stimuli are conditioned, ir, > 0. 

viii) e^ occurs on trial n if and only if a« €Z & r €R 
' 1 ^ n n,m y 

or a*j^Z'ftr^ " 
" n,m n 
' n 

ix) VC s.t. Vx €S*, if C-C^ and 7j = x, then P(r^ ^ cR^) = 0 or 
1, ApT^O ^ (3€>D) s.t. (V.n)P(a*€Aj) > €. ' ^ 
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Conditions i, ii, and iii guarantee a natural correspondence between 

A and S, Q and and F and R . Conditions iv and v guarantee that each 

T is a singleton, and since |i is the cardinality of a set. Axiom SI 
n,m 

guarantees that its single element will be sampled. Thus, T^^jn = s^^^, 

for all n and n. I will henceforth say simply that a pair (^^^,0^) is 

the stimulus on the subtriai on which it occurs. Since each s is a 

II, iii 

singleton. Axiom Rl guarantees th.vt if (r^^^^j^) is conditioned, the 

response to which it is conditioned will be given with probability 1, 

while condition vii strengthens Axiom R2 so that if i'^^f^^) is unconditioned, 

each response has a positive probability of being given. Condition iv 

guarantees that the response in the first stimxaus pair is r^, which 

corresponds to the requirement that the initial state of the f sa* s is 

q^. Condition v guarantees that the stimulus on each succeeding subtriai 

consists of the previous responss and an element c. of S. Altogether, 

this has the result that if =f(w) and C^ = f(D), S acts just like D 

would when presented with input w. Put more pjecisely, let w = a^ ^ 

V w 

where length(w) ^t. Let q©^ be the initial state of D, and q^^ 

the state D goes into after scanning a^. The action of D on w is 

described completely by the following (2t+l)-tuple, (<10v^a3^^q2^. . .at^,q^) . 

Let rn be the element of R in the stimulus on the first subtriai of 
n 

trial n, oa the element of S on subtriai i and r^ the response given on 

^ - ^ ^n 

subtriai i. The action of 0 on is similarly described by the following 

(2m+l)-tuple, where m = m to avoid cumbersome notation: {tq fOi^,Ti^**^ 

Om .rw„ ). If D has no unconditioned states, f(D) =C^,-and ai=f(w), then 
^n' ™n' y n " 

length (w) = m and the fact tliat D and act the same is shown by the fact 

that for all i and j, f(qi^) = and f(aj^) = aj^. If D has unconditioned 

states, these will correspond to unconditioned states in C^. The function 

f doesn^t say anything about the probabilities of the different responses, 

but the exact probabilities are inessential as long as they are all 

positive, and this is true of D since D € M^) and it is true for C^ because 

of condition vii. * 
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Intuitively, Z is the subset of whose elements should get a yes 

answer. Thus, the requirement in, condition viii that o^eZficr _ € or 

" n,mn y 

0*^ Z&r^^jjj^€ is equivalent to saying that the answer given 
on trial n is coi>rect. Condition viii is therefore equivalent to saying 
that a positive reinforcement occurs on trial n if and only if the answer 
given on trial n is correct. 

is the set of all elements of to which 4 gives an incorrect 
response if If A^^O, then the conditioning function is incorrect. 

Conditipn ix requires that stimuli be presented that will cause incorrect 
conditioning functions to be deconditioned. It^Ts~suff icieht to require 
only that P(on€A^)>€ for C*s that answer deterministically, and this is 
why Ap was defined only' for these C*s. 

Theorem: (W) yG) 1/1^^,0) 0=^ Vj#€ S(v,f(G))P(r^^jjj^€ ^ 

0;cf(G))^i] 

Proof. Suppose/c S(v,f (G) ) . The condition that r^^^n ^ ^^^n ^ ^^^^ 
equiv€aent to u(Cj^) € ^q(v,G), where u is the mapping f""'" and u(C^) € 
^q(v,6) means 3D € #f^Q(v,G) such that all states in ^(C^) are conditioned 
the same as in D or are indifferent in D. The strategy of the proof is 
similar to that of SRTFA; I will show that on each trial there is a 
positive probability of incorrectly c.i.^itioned states becoming 
deconditioned, not indifferent states becoming correctly conditioned. 
This will be done in two lemmas, but first, two definitions and one 
preliminary fact are needed. 

Definition: (VC eeJiiiJ) € ^j^(v)) W(D,C) = ((r^^^^^j)- (^I'^j) 
is conditioned in C, but is conditioned to a different 
response in C than in f(D)). 

Definition: V^D €^^(y),T^ is the event of all responses on 

trial n being compatible with = f (D)" 
In is a rather special event, since D €#^^(v) means all states in f(D) 
are conditioned* Let p = min p^^. By condition vi, p > 0. If m^ < t, 
and W(D,C^) = 0, then P(F§) > p*, sincenTor each subtrial on which an 
unconditioned stimulus which is conditioned in f (D) occurs, the 
probability is p of the response being given to which the stimulus is 
conditioned in f(rO. Since there are at most t subtrials, the result 
follows. 



I^tnma 1. (Vn) [ (v/D ££q(v,G) )W{D,C^) / 0 ^ (3 5' >0) s.x . VDe v,G) )P 

(at least one element in W(D,C^) is deconditioned and no remaining stimuli 

are conditioned on trial n) >6* j 

Pi oof. Let n be any trial number and D'e^Q^vv^G). Let I> be such 

that f(D) has all states in W(DSCj.) conditioned as in C, and ail other 

states conditioned as in D' . W{D,C^)= 0, so D^Sq(v,G) and A^^O. This 

means that 5^ 0* and hence, by condition ix, Pio^eA'^^^^) > If 

a*e4(D), leiigth(a^)<t. Since W{D,Cn) -0, F'.pg)^^^'- If 

FJ, then Pfcg ^) 1;. siace A^^-) is the subset of S** to w'«lch/ responds 

incorrectly if C^ = f(D). Tnus, on at least cr.e subcrial a conditioned 

stimulus, say (r^^a^), must have been in W(DSC^). If no (r^,a^) G W(D» ,C^), 

then Fg'is equivalent zo F^, since all the states not in W(D' ,0^) are 

conditioned the same in D and D' . Since F^ occurs, fJ' occurs. But. xnis 

is impossible, since if fJ' occurs, H^i^ri^ = h because D' c «^>q(v,G) . Since 

(r ,a ) was a stimulus on some subtrial and e„ ^ occurs, by axiom C6 

i j '^f"^ 
{r^,a^ will be deconditioned with probability d. Patting this together, 

P((r.,a.) is deconditioned on trial n) > dP( o*^ 4^ c)^^?- ^r,n' ^ 
Taking 6' = dp-€^ we get the desired result, since by axioms C5 and C? 
no stimuli can be conditioned when occurs. 

Lemma 2. {Vn)(VDe J)q(v,G))[W(D,C^) = 0=» (3 8"> O) s.t. V pairs (r.,aj) 
which are unconditioned in and not indifferent in D,P((r^,yj) is 
' conditioned and no state is conditioned differently than in D on trial 
n)>5"]. 

Proof. Let n be any trial number and v* ei/Q(v,G) be such that W(D*,C^)= 0- 
»If no such D' exists, there is nothing to prove. If there are no 
unconditioned stimuli that are not indifferent in D' , there is likewise 
nothing to prove, so assume there is at least one such, say (^j^^^^j)* 
Let rj^ be the response that i.r^,a^) is conditioned to in D' and let 
D e*^(v) be conditioned the same as D' except that is 
conditioned to r^. rj^ and D exist since {r^,a^) is not Indifferent in 
D' . By argument similar to that of lemma 1, P(ane >e, and 

P(fJ)>P*. Also, P(fJ)>p'^ since W(D,C^)= 0. If Fg occurs, ^(eg,^" ^ 
while if F^' occurs, PCe^ ) = 1. Since D and D' differ only in the way 
(r ,a.) is conditioned, and since different anyvsrs occurs in the event 
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of and F^*, (r.,a J must be- the stimulus on some subtrial. Putting 
n n ' ^ i' 

this together, P(cJn^^f(D)^^n*' ^ n^^^^*^' Hence, by axiom C2, 
P((r.>a.) is conditioned on trial n) >cP^'C. Letting S^^cp'^e^ we get 
the desired result, since the fact that FJ^ and e^ occurred means no state 
can be conditioned differently than in D' by axioms C3, CU, and C7 . 

For convenience, let 8= min(6* ,6") , \x(0^ ) z£q{y,G) if and only if 
3DgDq(v,G) s.t. W(I>xG^)-= O^lind has no unconditioned states that are 
not indifferent in D. lict k be any trial and Cj^ any conditioning 
function. Choose D^SJvfZ) o.* . y(T:.Z, ) has the minimum number jf 
elements, say Lemma 1 guarantees that there is a probability £^ that 



there will be at least one element D* of^Q(v,G) such that, for some 
k<k^<k-rm, V/(DSC^f)= 0* might be less than k + m, since more than 
one state can be deconditioned on a trieil. Moreover, lemma 1 doesn^t 
guarantee that D=D*, for it can^t be applied unless for all elements B 
of J9q(v,G), W(B,C^)^0, and it is possible that on some triaJL k* there 
is a 5^ D such that W(D* ,Cj^, ) = 0, so that lemma 1 will be inapplicable. 
Also, there is a probability that some of the correctly conditioned 
states will be deconditioned. Both of these cases are all right, since 
lemma 2 requires only that there be a D% and does not specify that any 
state in must be conditioned. In a sense, lemma 1 applies to the 
worst possible case, euid the only cases where it might not apply is where 
what we want to happen has already occurred* Let D* be such that 
W(D^ ,Cj^i ) = 0, and let m^ be the number of unconditioned states that are 
not indifferent in . By lemma 2, on each tristl P(siich a state is 
conditioned and no state is conditioned differently than in DO so 
after m* trials, P(no such states) >6^ . 

Once this occurs, the correct answer occurs with probability 1, 
so by condition viii, e^ occurs with probability 1. By axioms Ck and 
Cf , the conditioning of al\ conditioned states remains the same* Thus, 
only the conditioning of unconditioned states, which must be indifferent, 
can "be changed, and if this occurs, the u of the resiilting conditioning 
function is still inJD^iYfG). 



m 



and m* are always < gh. so no matter what Cj^ is, 3k^< k+2gh such 



that P{u(Cj^|)€i6>Q(v,G))>69gh^ By what was said in the preceding 
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paragraph, P{ uv Cj^^^ch '^4)^ ^' ^'^h) be the greatest 

integer in -2-. Then, regardless of the initial state of conditioning 
8h . 

of/, P(u(C^) ^«^q(v,G)) <(l-6p^j^*^'^*^^ This approaches 0 as n 
approaches infinity, so P(u(C^}cJ^{ Vj,G)) approaches 1 as n approaches 
infinity- Q*E.Dc 

A few remarks concerning tris theorem are in order. The theorem 
gives a lower bound on the tbx^ of learning, but the actual rate of 

^^^^J-eftrning will be much faster than this lower bound* In the usual case, 
the original conditioning function will have all states unconditioned^ 
while the theorem allows for the poi^sibility that all states are conditioned 
incorrectly. The. lower bound alsc does not use the fact thai, more than 
one state can be conditioned (deconditloned) on a given trial- Moreover, 
8 was calculated using the minimum of the p^, so the fact that there is 
a higher probability of some responses being given, and hence being 
Gondii ioned or deconditioned, is ignored. Also, the minimum of c and d 
is chosen* Very importantiyp it takas a sequence of gh trials to get the 
guaranteed re stilt ^ of the theorem, while in most cases a much shoi-ter 
sequence is ail that is necessary* Also, it is cert.ainiy possible for 
some states to be conditioned correctly even if W(D,C^)5i^0, which is 
not taken account of by the theorem* Finally, even if is not a 

" member of the appropriate or if does not occur, there is a 
probability that some 3t.at.es will be correctly conditioned or that some 
incorrectly conditioned states will be deconditloned. Although it is 
obvious that the actual learning rate is .,.uch faster than the lower bound 
given by the theorem, calculating an actioal expectation would be brutal. 
Thus, I have no precise results on how fast learning wotxld actually 
occur, However, it is probably true that the process as it stands would 
be adequate for only fairly simple tasks, since it would be too slow 
for more complex tasks. There are five reasons that this may not be 
as severe a limitation as might at first seem. Tirst, it may turn out 
best to think of learning a complex task as combining previously learned 
simple tasks, and that it is only the simple tasks that have to be 
learned by the above method. Secondly, it would not be surprising if 
such basic learning ook place slowly, although perhaps not as slowly as 
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the above set up requires. Thirdly, it might be possible to keep the 
above framework essentially unchanged and make some adjustments to get 
a faster rate of learning. Fourthly^ in the above work, as in most 
psychological experiments, the rate of learning is given in ^^rms of the 
number of trials needed to learn the task, while in ordinary talk, the 
rate of learning is given in terms of the amount of time needed. What 
the relationship betveen niamber of trials and amount of time is not very 
clear- It may be that a large number of trials corresponds to a short 
period of time, in which case the fact that the above learning requires 
many trials may not be a serious fault . Las^.ly, the above learning 
takes place with the minimal amount of information given on each trial, 
since all the reinforcement does is tell whether or not the final 
response is correct. No Indication is given of where mistakes occurred 
or what the right procedure would have been. Most learning situations 
contain this other information, and when it is excluded in an artificial 
situation, ihe learning task is indeed made much more difficult. 
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C'^x voT/i vas an si'^tempi to build a mathematical node! of a device 
T/har could learn geometry. Toi^ besit **ay to visualize ic is 'oo ^hink of 
it as an attempx. to C0iir?ect sonie tsV'- matheinaxical learning model 
to geometry. This is the general plan, b.u^ xio get a specific problem^ 
it is nGcer>sary to choose a par^io^lar 'rv-pe of learning model; and to 
make ^he proble:n ma.';ageatjle;. o:'ie ha.H :.o ilml^, oneself to a fragment of 
geometry. 

'The learnii'ig model we c*r.Oi5e vas an S-I-^ raodel. 'J?his cL. ice is not 
unproblemtioalp for cogniivve psychologists and linguists like Chomsky 
have denied the adequacy of 3-? .models for the type of leaming ve 
wanted. Tiieir reiiiar 'i have oeen iiosi ly about lar:guage lec\rning, but they 
are also applicable to our work wich geouie-rya i have mentioned 
similarities be^vee.^ our vork^ ani? linguiistics in the previous chapters, 
and I will itidicace sr.crtly t.hat *:he situation as far as language 
learning is conc-iriif^vi. Is similar r-o our present situation. 

There are four reasons fd' choosing the S-R jnodel in spit^'of the 
criticism it has received. Fi^st and foremost, it Is the only learning 
model with any degree of mathemtdcal sophist ica'rdoa. Tn choosing a 
learning model, the S-?. model wins almost by defaulT, for its critics 
have not produced a serious co;npetitor. Chomsky, for exa/aple, mak$s a 
few rem?" chat indicate he thinks some sort of emnneraticn procedure 
is wi needed. He speaks of a device for learning languc ^e operating 

by s^- ir^ one member of 'cae class of potential grammars on th^ basis 
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of primary linguistic data. These remans are not developed into a 
precise formal theory, however. Secondly, in the terminology of Chapter 
5, Chomsky favors the enumeration method over the brute force method. 
While abstract criticism of the brute force method seems plausible, when 
it comes down to making a concrete choice between brute force and 
enumeration, it seems to me that our decision to concentrate on barute 
force is correct. Thirdly^ the criticisms of S-R models have consisted 
of claims, not proofs, that they are inadequate. Whether or not they 
are in fact inadequate is an open question until such a proof is given. 
This leads me to my final point, which is that S-R theory is very much 
an alive area today, and modifications and improvements of S-R models 
are still being given. A really convincing proof of the inadequacy of 
S-R models would have to show' not only that all present models are 
inadequate, but that it would 'be impossible to develop an adequate model 
within the S-E tradition. This would require the formulation of certain 
properties that S-R models must have. This formulation is currently 
lacking (I don't see how it -ould be given at the present time), and 
hence any proof of inadequacy is out of the question* Behind these last 
remarks is the view that a proof of inadequacy of any model of a certain 
type requires much more precision and rigor than proving the adequacy of 
a certain model, a point which Professor Suppes is fond of making. What 
has actually happened, I think, is that critics of S-R models have 
leveled their criticisms at early, fairly iindeveloped versions of the 
model, and tended to ignore more recent develojanents and the passibiliti-^s 
of developing more adequate models within S-R tradition. 

The fragment of geometry we considered is that part of geometry 
that can be encoded in the codings given in Chapter 2. Codings apply 
only to two-dimensiorf. • straight-line drawings, and the only geometrical 
predicates that apply to such drawings which can be recognized from a 
coding ax*e 'connected* and those involving the recognition of polygons. 
However, this fragment could be augmented by adding further information 
and then coupled with the artificial intelligence work to get a device 
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that could deal with .real-life situations. It is not clear exactly 
how to do this> but ^the fragment we chose to concentrate on seems to be 
the ^natural starifing place for such'^a project. 

We did not try to,, connect an S-R model with this coded fragment of 
geometry directly. In between the S-R model and the fragment of geometiy 
are two types of processing devices^ fsa^s and Turing machines. One 
doesnH want to deal with general Turing machines^ since these devices 
.have virtually unlimited calculating power. To get a realistic model 
of human behavior, it will be necessary to place some restriction on 
the type of Turing machirife calculations that are accaptable. These 
restricted calculations I will call * Turing-type procedures,* and ^ is 
these that we are interested in, though Just what restrictions should be 
made is not clear. Certainly, there should be some sort of limit on 
th^ size of the machine and length of calculations involved, and perhaps 
other restrictions would also be desirable. 

Thus, there, are four originally unrelated elements that we dealt 
■ with: S-R models, fsa^s, Turing-type procedures, and the fragment of 
geometry. The purpose of the work with the coding in Chapter 2 is to 
provide a connection between the fragment of geometry and Turing- type 
procedures. The purpose of the work on learning theory in Chapter 5 
was to strengthen the connection between S-R models and fsa*s that was 
established in SRTPA. Schematically, the situation as it is now is given 
by the following diagram: 
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geometry 


models 
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Method 2 ^ ^ ^ ^ 

Diagram 1 

Present Status of the Technical Problem 



Our ultimate goal is to connect S-R models to the fragment of geometry, 
and, as the diagram indicates, this has not been accomplished. Considerable 
progress has been made, however. What we did was to start at the two 
ends of the problem and try to meet in the midale, and we werenH quite 
successful. 
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There are two possible ways of completing the connection. It is 
not possible to connect f sa' s and Turing-type procedures directly, 
since the latter are provably more powerful. One method woiad be to try 
to connect the fragment of geometry to fsa^s, which is indicated in the 
diagram as Method 1. I donH believe this method can be completely 
satisfactory, since the problems mentioned in dealing with the codings 
in tems of fsa's that were mentioned in Chapter 2 seem to be 
fundamental. This method might be partially satisfactory, however. 
The work in Chapter 2 should prove useful in making this connection, if 
it can be made. Method 2 seems to me to be much more promising. An fsa 
is essentially a set of triples, which a Turing machine is a set of 
quadruples, and there is no reason to believe that an S-R model cannot 
become a Ttiring machine at asymptote. The work in Chapter 5 would be a 
useful first step in making this connection. Whichever method is 
chosen, either the work on the coding or learning will be a necessary 
link in the final chain connecting S-R models and the fragment of geometry, 
and the work that is replaced (Chapter 2 If Method 1 is chosen. Chapter 5 
if Method 2 is) should be useful in making the final connection. Thus, 
considerable progress in solving the problem has been made. 

An analogy with linguistics can be drawn by replacing ♦fragment of 
geometry with ♦ natural languages. ♦ The work of Chomsky, among others, 
has consisted primarily of an attempt to establish the connection between 
natural languages and transformational grammars, which are an example 
of what I called a Turing-type procedure. This work corresponds to the 
work in Chapter 2, though it has, of course, been more extensively developed 
than our work. Not much work on learning has been done by linguists, 
so it is hard to say what the left half of the diagram should look like. 
If they did start with an S-R model, the result would be exactly like 
Diagram 1, including the corresponding gap and methods for filling it# 
Thus, the remaining part of our problem is similar to problems in other 
areas, and solving it would have far-reaching implications. 
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This completes the picture of the overall problem. The status of 
our work in learning theory and what future developments might "be have 
already "been indicated, so in conclusion, I want to make some remarks 
on the present status of the. coding problem and what future developments 
might occur. Working only with the coding places severe restrictions 
on the geometry that can be done. The size of angles and the length 
of line segments are not included in the coding, and it is impossible 
to distinguish concave from convex figures, what is inside a polygon 
from what is outside and the simple regions in a figure. Whetlier a 
figure is connected and the various polygons that it contains is all 
that one can hope to distinguish. Thus, theorems 1, 2, and 5 of 
Chapter 2 take care of the positive results that might be expected 
since they show that * connected* and * polygon* can be recognized. 
Moreover, they give reasonable procedures for accomplishing this. One 
problem with all three theorems, as Professor Hintikka pointed out in 
regard to theorem 3, is that each requires arbitrary choices and gives 
no strategy for making these choices. Finding optimal strategies, or 
discovering whether particular strategies make much difference, is a 
natural problem that has not been solved. 

The fact that the coding contains such a limi^ed amount of 
information means that something will have to be done to get more 
information. One way of doing this would be to allow what we called 
construction operations, i.e., allow auxiliary lines to be added to a 
figure and hence to a coding for it. This requires going back to the 
original figure, and this is undesirable. For example, it would be 
possible to recognize inside/outside by complicated procedures for adding 
lines, but this is very counterintuitive. A better method, it seems 
to me, would be to augment the coding by adding information concerning, 
say, ins ide/out side. Unfortunately, I have no suggestions on what 
would be the best way to do this. - It seems as though it would be easy 
to add information concerning the length of line segmsnts and sizes of 
angles, but such things as inside/outside are more difficult. 
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Finally, the most inrpcrtant unsolved problems concern the relationship 
between codings (or sets of lines), and figures* The outstanding problem 
is formulating necessax-y an^ sufficient conditions for a good set of 
lines to be a coding. This turns out to be a difficult problem, but 
there doesnH seem to be any reason a general solution can't be given. 
The problem would be easier if more information is added to the coding, 
but it should be solvable without this extra inf ormation^ Related to 
this problem are the twin problems of under what groups of transformations 
codings remain invariant and what the classes of figures that have the 
same, or equivalent, codings look like. 
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